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(54) Title: HUMAN MUCOSAL ADDRESSIN CELL ADHESION MOLECULE-1 (MAdCAM-1 ) AND SPLICE VARIANTS THEREOF 



(57) Abstract 

The present invention relates to novel MAdCAM-1 proteins designated herein as MAdCAM-l(a-e). which arc cell adhesion molecules. 
In particular, isolated nucleic acid molecules are provided encoding the human MAdCAM-l(a-e) proteins, MAdCAM-l(a-e) polypeptides 
are also provided as arc vectors, host cells and recombinant methods for producing the same. The invention further relates to screening 
methods for identifying agonists and antagonists of MAdCAM-l(a-e) activity. Also provided arc diagnostic methods for detecdng cancer 
or a pathological inflammatory condition, and therapeutic methods for treating an individual in need of a reduction in the activity of any of 
MAdCAM-I(a-c). In another aspect, the invention provides isolated genomic DNA molecules comprising the 5 exons which comprise the 
genes which encode any of MAdCAM-l(sHe), as well as the 5' flanking region which includes the promoter for these genes. In another 
aspect, the invention relates to a method of screening compounds for the ability to regulate expression of any of MAdCAM-l(a-e) from 
their promoter. The invention also relates to a method of selectively expressing genes on gut endothelia. 



I 



wo 98/20110 



PCT/US96/17S49 



-5- 

Number 97758 on October 10, 1996. The nucleotide sequence determined by 
sequencing portions of the deposited genomic DNA, which is shown in FIG. 6, 
includes the sequence of the 5* flanking region, given in SEQ ID NO:33, as well 
as the sequences of exons 1-5, given in SEQ ID NOS:34-38, respectively. 

The invention further provides isolated MAdCAM-1 polypeptides 
(MAdCAM-l(a-e)) having an amino acid sequence encoded by a polynucleotide 
described herein. 

The present invention also provides a screening method for identifying 
compounds capable of enhancing or inhibiting a cellular response induced by any 
of the MAdCAM-1 polypeptides (designated MAdCAM-1 (a-e)), which involves 
contacting cells which express the desired MAdCAM-1 polypeptides with the 
candidate compound, assaying a cellular response, and comparing ttie cellular 
response to a standard cellular response, the standard being assayed when contact 
is made in absence of the candidate compound; whereby, an increased cellular 
response over the standard indicates that the compound is an agonist and a 
decreased cellular response over the standard indicates that the compound is an 
antagonist. 

The invention also provides a diagnostic method useful during diagnosis 
of an inflammatory disorder. 

An additional aspect of the invention is related to a method for treating an 
individual in need of a decreased level of MAdCAM-l(a-e) activity in the body 
comprising administering to such an individual a composition comprising a 
therapeutically effective amount of an antagonist of MAdCAM-l(a-e)-mediated 
adhesion. Preferred antagonists for use in the present invention are MAdCAM- 
l(a-e)-specific antibodies, as well as soluble forms of MAdCAM-l(a-e). 

As the invention also includes isolated genomic DNA molecules 
comprising the 5* flanking region of MAdCAM-1 (a-e), including the promoter for 
these genes, yet another aspect of the invention is related to a method for 
identifying compounds capable of enhancing or inhibiting expression of any of 
MAdCAM-l(a-e). 
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Because MAdCAM-1 is selectively expressed on HEV and on lamina 
propria venules, the promoter can also be used to selectively target therapeutic 
genes to the gut endothelia. 

Brief Description of the Figures 

FIGS. 1 A and IB show the nucleotide (SEQ ID NO: 1) and deduced amino 
acid (SEQ ID N0:2) sequences of MAdCAM-l(a). The protein has a leader 
sequence of about 17 amino acid residues (first underlined region), followed by 
an extracellular domain. The second underlined region corresponds to the 
transmembrane domain, and is followed by the intracellular domain. The 
predicted amino acid sequence of the mature MAdCAM-1 (a) protein (which lacks 
the leader sequence) is also shown in FIG. 1 (SEQ ID N0:2). 

FIGS. 2A and 2B show the nucleotide (SEQ ID N0:3) and deduced amino 
acid (SEQ ID N0:4) sequences of MAdCAM-l(b). The protein has a leader 
sequence of about 17 amino acid residues (first underlined region), followed by 
an extracellular domain. The second underlined region corresponds to the 
transmembrane domain, and is followed by the intracellular domain. The 
predicted amino acid sequence of the mature MAdCAM-l(b) protein (which 
lacks the leader sequence) is also shown in FIG. 2 (SEQ ID N0:4). 

FIG. 3 shows the nucleotide (SEQ ID N0:5) and deduced amino acid 
(SEQ ID N0:6) sequences of MAdCAM-l(c). The protein has a leader sequence 
of about 17 ammo acid residues (first underlined region), followed by an 
extracellular domain. The second underlined region corresponds to the 
transmembrane domain, and is followed by the intracellular domain. The 
predicted amino acid sequence of the mature MAdCAM-l(b) protein (which 
lacks the leader sequence) is also shown in FIG. 3 (SEQ ID N0:6). 
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FIGS. 4A and 4B show the nucleotide (SEQ ID N0:7) and deduced amino 
acid (SEQ ID NO:8) sequences of MAdCAM-l(d). The protein has a leader 
sequence of about 1 7 amino acid residues (first underlined region), followed by 
an extracellular domain. The second underlined region corresponds to the 
transmembrane domain, and is followed by the intracellular domain. The 
predicted amino acid sequence of the mature MAdCAM-l(d) protein (which 
lacks the leader sequence) is also shown in FIG. 4 (SEQ ID N0:8). 

FIGS. 5 A and 5B show the nucleotide (SEQ ID N0:9) and deduced amino 
acid (SEQ ID NO: 1 0) sequences of MAdCAM- 1 (e). The protein has a leader 
sequence of about 17 amino acid residues (first underlined region), followed by 
an extracellular domain. The second underlined region corresponds to the 
transmembrane domain, and is followed by the intracellular domain. The 
predicted amino acid sequence of the mature MAdCAM- 1 (e) protein (which lacks 
the leader sequence) is also shown in FIG. 5 (SEQ ID NO: 10). 

FIGS. 6 A and 6B show the nucleotide sequence of genomic DNA 
encoding the region 5* to the gene encoding MAdCAM- 1 (SEQ ID NO:33). Also 
shown are exons 1-5 (SEQ ID NOS:34-38, respectively), which comprise the 
genes which encode any of MAdCAM-l(a-e). Lower case letters represent intron 
sequence. 

FIGS, 7A and 7B show the regions of shnilarity between the predicted 
amino acid sequences of the human MAdCAM-l(a-e) proteins (SEQ ID N0S:2, 
4, 6, 8, 10, respectively), mouse MAdCAM-1 (SEQ ID NO:46), and the predicted 
amino acid sequence of human MAdCAM- 1 from Shyjan et al, J, Immunol 
i55(8):285 1-2857 (1996) (SEQ ID NO:47). 

FlG. 8 shows an analysis of the MAdCAM-l(a) amino acid sequence. 
Alpha, beta, turn and coil regions; hydrophilicity and hydrophobicity; 
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amphipathic regions; flexible regions; antigenic index and surface probability are 
shown. In the "Antigenic Index - Jameson-Wolf ' graph, amino acid residues 52- 
80, 164-296 and 228-321 in FIG. 1 correspond to the shown highly antigenic 
regions of the MAdCAM-1 (a) protein. 

FIG. 9A shows the isolation of IVlAdCAM-l(a) cDNA. MAdCAM-l(a) 
cDNAs were initially identified as expressed sequence tags (ESTs), clones 
HEBBC23X and Y, in an EST database created from an early stage human brain 
cDNA library. The insert of clone HEBBC23 Y was subsequently used to isolate 
clone MAD-Cl from a human cosmid library. Complementary DNA encoding 
the 5'-end of human MAdCAM-l(a) was obtained by PGR using PGR primers 
designed from HEBBC23X and MAD-Cl, yielding PGR clone PCRl-5'. The 
upper FIG. illustrates a partial restriction map of the composite cDNA sequence 
derived from the overlapping partial clones. The boxed region denotes the open 
reading frame; and the restriction enzyme sites are marked with vertical lines. 
FIG. 9B shows nucleotide and deduced amino acid sequence of human 
MAdCAM-l(a) (SEQ ID N0S:1 and 2). The numbers in the right-hand margin 
show nucleotide and amino acid positions, respectively. The initiation 
methionine has been assigned to position 1 by comparison with the mouse 
MAdCAM-l(a) sequence. The putative signal peptide and transmembrane 
domains are underlined. The major (residues 226 to 273) mucin domain is boxed, 
and the minor mucin (residues 278 to 31 1) domain is italicized, and cysteines 
expected to form disulphide bonds in the two immunoglobulin domains are 
circled. A potential polyadenylation signal site is overlined. 

FIGS. lOA and lOB show a comparison of the major mucin domain of 
human MAdCAM-l(a) with the imperfect repeats of the mucin domain of the 
intestinal mucin MUC-2. In FIG. lOA, tiie six octomer repeats comprising the 
major mucin domain of MAdCAM-l(a) have been aligned (SEQ ID NOS:49, 50, 
50, 5 1 , 5 1 , and 52, respectively), and shared residues are indicated by bold type. 
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In FIG. lOB, the six repeats of the MAdCAM-l(a) major mucin domain (SEQ ID 
NO:53) and MUC-2 (SEQ ID NO 55) are optimally aligned (comparison is SEQ 
ID NO:54). Identical amino acids are indicated, and conservative substitutions 
are denoted (+). Numbers refer to amino acid residues. 

FIGS. 1 1 A and 1 IB show an identification of MAdCAM-1 splice variants 
(SEQ ID N0:2). In FIG. 1 lA, partial sequences of MAdC AM- 1 splice variants 
encoding the second Ig domain and the major mucin domain or parts thereof have 
been aligned. HEBBC23 Y, which is missing 3 mucin repeats, was identified as 
an EST. Sequences 3, 5 and 7 are missing a major portion of the second Ig 
domain and 3 to 6 mucin repeats were isolated as PGR products following 
amplification from fetal brain RNA. In FIG. 1 IB, sequences of acceptor and 
donor splice sites in MAdCAM-1 variants are shown. Potential 5' splice donor 
and 3' splice acceptor sequences identified in the four MAdCAM-1 splice variants 
have been aligned (SEQ ID NOS:56-59, respectively). Sequences retained are 
emboldened, whereas sequences deleted are in normal type. The sequences of the 
3' acceptor sites conform well to the consensus for splice junctions, whereas the 
5' splice donor sequences vary from the consensus for splice junctions. 

FIG. 12 shows proposed structures for MAdCAM-1 splice variants. The 
Ig domains are shown as ovals, and the mucin domains are represented as 
20 decorated rods, where the minor mucin domain is less decorated. 

FIG. 13 shows the DNA sequence of the 5*-flanking region of the human 
MAdCAM-1 gene (SEQ IDNO:33) and comparison with the mouse MAdCAM-1 
promoter (SEQ ID NO:48). Numbers refer to nucleotide positions and are 
relative to the translational start codon, which is underlined. Potential 
25 transcriptional factor binding sites identified in the human and mouse 5'-flanking 

regions are underlined. Identical nucleotides shared by the human and mouse 
sequences are denoted by vertical lines. 
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FIGS. 14A, 14B and 14C show that the 5'-flanking region of the human 
MAdCAM-1 gene has promoter activity in the human dermal endothelial cell 
line HMEC. Figure 14A is a schematic representation of the basic luciferase 
vector pGL-2/B, and the expression vectors pGL-2/B-718+ and pGL-2/B-718- 
derived from it, which contain a 700 bp 5'-flanking region (-718 to +20 relative 
to the translational start) in sense and antisense orientations, respectively. Figure 
14B and 14C show the relative luciferase activity directed by tiie expression 
vectors in tiie human dermal endothelial cell line HMEC. The results are from 
two separate experiments where promoter activity is expressed as the relative 
photon count above the background control of cells transfected with no DNA. In 
14B and 14C, cells were cultured in the presence or absence of PMA. Values are 
the average of duplicate experiments. RT-PCR of MAdCAM-1 and 
glyceraldehyde 3-phosphate dehydogenase from HMEC cells was performed with 
the U707 and L1072 primers generating tiie expected band of 386 bp. 

Detailed Description of the Preferred Embodiments 

The present invention provides isolated nucleic acid molecules comprising 
a polynucleotide encoding any one of the MAdCAM-l(a-e) polypeptides having 
tiie amino acid sequences shown in FIGS. 1-5 (SEQ ID N0s:2, 4, 6, 8, 10), 
respectively, which was determined by sequencing a cloned cDNA. The 
MAdCAM-l(a-e) proteins of the present invention share sequence homology 
with mouse MAdCAM-1 (FIG. 7A and 7B) (SEQ ID NO:46). The nucleotide 
sequence shown in FIG. 1 (SEQ ID N0:1) was obtained by sequencing tiie 
HEBBC23 clone. The nucleotide sequence shown in FIG. 2 (SEQ ID N0:3) vras 
obtained by sequencing tiie HSKCW36 clone, which encodes MAdCAM-l(b). 
a splicing variant of tiie deposited cDNA clone described below. The nucleotide 
sequence shown in FIG. 3(SEQ ID N0:5) was obtained by sequencing tiie 
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MAdCAM-lc clone, which encodes MAdCAM-l(c), a splicing variant of the 
deposited cDNA clone described below. The nucleotide sequence shown in FIG. 
4 (SEQ ID N0:7) was obtained by sequencing the MAdCAM-ld clone, which 
encodes MAdCAM-l(d), a splicing variant of the deposited cDNA clone 
described below. The nucleotide sequence shown in FIG. 5(SEQ ID N0:9) was 
obtained by sequencing the MAdCAM-le clone, which 'encodes MAdCAM-l(e), 
a splicing variant of the deposited cDNA clone described below. 

The invention also relates to isolated genomic DNA molecules comprising 
the 5 exons (all of which are shown in Fig. 6) which comprise the coding region 
of any of the M AdCAM- 1 splice variants (MAdC AM- 1 (a-e)), as well as sequence 
located 5* to the start codon of the first exon, which includes the promoter for the 
MAdCAM-1 splice variants. A genomic clone comprising this genomic DNA 
was deposited on October 10, 1996, at the American Type Culture Collection, 
12301 Park Lawn Drive, Rockville, Maryland 20852, and given accession 
number 97758. The sequence of the 5' flanking region, which includes the 
promoter for the genes encoding any of MAdC AM- 1 (a-e), is given in SEQ ID 
NO:33. The sequences of exons 1-5 are given in SEQ ID NOS:34-38, 
respectively. Example 6 gives further description of how the 5 exons shown in 
FIG. 6, or portions thereof, can be combined in order to generate the splice 
variants of M Ad-C AM- 1 . 

The present invention also relates to isolated nucleic acid molecules 
comprising a polynucleotide encoding the MAdCAM-l(a) polypeptide encoded 
by the cDNA clone deposited in a bacterial host as ATCC Deposit Number 97759 
on October 10, 1996. The deposited clone is contained in the pBluescript SK(-) 
plasmid (Stratagene, LaJoUa, CA). 
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Nucleic Acid Molecules 

Unless otherwise indicated, all nucleotide sequences determined by 
sequencing a DNA molecule herein were determined using an automated DNA 
sequencer (such as the Model 373 from Applied Biosystems, Inc.), and all amino 
acid sequences of polypeptides encoded by DNA molecules determined herein 
were predicted by translation of a DNA sequence determined as above. 
Therefore, as is known in the art for any DNA sequence determined by this 
automated approach, any nucleotide sequence determined herein may contain 
some errors. Nucleotide sequences determined by automation are typically at 
least about 90% identical, more typically at least about 95% to at least about 
99.9% identical to the actual nucleotide sequence of the sequenced DNA 
molecule. The actual sequence can be more precisely determined by other 
approaches including manual DNA sequencing methods well known in the art. 
As is also known in the art, a single insertion or deletion in a determined 
nucleotide sequence compared to the actual sequence will cause a frame shift in 
translation of the nucleotide sequence such that the predicted amino acid 
sequence encoded by a determiiied nucleotide sequence will be completely 
different from the amino acid sequence actually encoded by the sequenced DNA 
molecule, beginning at the point of such an insertion or deletion. 

Using the information provided herein, such as the nucleotide sequence 
in FIGS. 1-6, a nucleic acid molecule of the present invention encoding any of the 
MAdCAM-l(a-e) polypeptides may be obtained using standard cloning and 
screening procedures, such as those for cloning cDNAs using mRNA as starting 
material. Illustrative of the invemion, the nucleic acid molecules described in 
FIGS. 1 -5 (SEQ ID NOs: 1 , 3, 5, 7, 9) were discovered in a cDNA library derived 
from human fetal brain cells. The genes were also identified in cDNA Ubraries 
from the following tissues: small intestine, colon, spleen, and pancreas. The 
determined nucleotide sequences of the MAdCAM-l(a-e) cDNAs of FIGS. 1-5 
(SEQ ID NOs:l, 3, 5, 7, 9), respectively, contain an open reading frame encoding 
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a protein of 382, 366, 263, 310, and 289 amino acid residues, respectively, 
wherein each of MAdCAM-l(a-e) has an initiation codon at positions 1-3 of their 
respective nucleotide sequence in FIGS. 1-5 (SEQ ID NOs: 1, 3, 5, 7, 9), and each 
has a predicted leader sequence of about 17 amino acid residues. The mature 
MAdCAM-l(a-e) polypeptides will of course lack this leader sequence. The 
deduced molecular weights of complete MAdCAM-l(a-e) polypeptides are about 
40, 38, 27, 32 and 32.4 kDa, respectively. 

In another aspect, the invention relates to isolated genomic DNA 
molecules comprising the 5 exons which comprise the coding region of any of the 
MAdCAM-1 splice variants (MAdCAM-l(a-e)), as well as sequence located 5' 
to the start codon of the first exon, which includes the promoter for the 
MAdCAM-1 splice variants. The sequence of the 5* flanking region, which 
includes the promoter for the genes encoding any of MAdCAM-l(a-e), is given 
in SEQ ID NO:33. The sequences of exons 1-5 are given in SEQ ID NOS:34-38, 
respectively. 

In another aspect, the invention provides isolated nucleic acid molecules 
comprising the genomic DNA sequence contained in the clone deposited as 
ATCC Deposit No. 97758 on October 10, 1996. 

The present invention also relates to isolated nucleic acid molecules 
comprising a polynucleotide encoding the MAdCAM-l(a) polypeptide encoded 
by the cDNA clone deposited in a bacterial host as ATCC Deposit Number 97759 
on October 10, 1996. The nucleotide sequence determined by sequencing the 
deposited cDNA clone, MAdCAM-l(a), which is shown in FIG. 1 (SEQ ID 
N0:1), contains an open reading frame encoding a polypeptide of 382 amino acid 
residues, including an initiation codon at nucleotide positions 1-3, with a leader 
sequence of about 17 amino acid residues, and a predicted molecular weight of 
about 40 kDa. The amino acid sequence of the mature MAdCAM-l(a) protein 
is shown in FIG. 1, amino acid residues 18-382 (SEQ ID N0:2). 

As indicated, the present invention also provides the mature form(s) of the 
MAdCAM-l(a-e) proteins of the present invention. According to the signal 
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hypothesis, proteins secreted by mammalian cells have a signal or secretory 
leader sequence which is cleaved from the mature protein once export of the 
growing protein chain across the rough endoplasmic reticulum has been initiated. 
Most mammalian cells and even insect cells cleave secreted proteins mth the 
same specificity. However, in some cases, cleavage of a secreted protein is not 
entirely uniform, which results in two or more mature species on the protein. 
Further, it has long been known that the cleavage specificity of a secreted protein 
is ultimately determined by the primary structure of the complete protein, that is, 
it is inherent in the amino acid sequence of the polypeptide. Therefore, the 
present invention provides a nucleotide sequence encoding the mature 
MAdCAM-l(a-e) polypeptides having the amino acid sequence encoded by the 
cDNA clone shown in Figures 1-5 (SEQ ID N0:2, 4, 6, 8, 10). By the mature 
MAdCAM-l(a-e) proteins shown in FIGS. 1-5 is meant the mature form(s) of the 
MAdCAM-1 proteins produced by expression m a mammalian cell (e.g., COS 
cells, as described below) of the complete open reading frame encoded by the 
human DNA sequence of the cDNA clone contained in the vector in the deposited 
host. As mdicated below, the actual mature MAdCAM-l(a-e) polypeptides may 
or may not differ from the predicted "mature"' MAdCAM-l{a-e) polypeptides 
shown in FIGS 1-5, depending on the accuracy of the predicted cleavage site 
based on computer analysis. 

Methods for predicting whether a protein has a secretory leader as well as 
the cleavage point for that leader sequence are available. For instance, the 
methods of McGeoch (Virus Res. 5:271-286 (1985)) and von Heinje {Nucleic 
Acids Res. 74:4683-4690 (1986)) can be used. The accuracy of predicting the 
cleavage points of known mammalian secretory proteins for each of these 
methods is in the range of 75-80%. von Heinje, supra. However, the two 
methods do not always produce the same predicted cleavage pouit(s) for a given 
protein. 

In the present case, the predicted amino acid sequence of the complete 
MAdCAM-1 (a-e) polypeptides of the present invention were analyzed by a 
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computer program ("PSORT*) (K. Nakai and M. Kanehisa, Genomics 7^:897-91 1 
(1 992)), which is an expert system for predicting the cellular location of a protein 
based on the amino acid sequence. As part of this computational prediction of 
localization, the methods of McGeoch and von Heinje are incorporated. The 
analysis by the PSORT program predicted the cleavage sites between amino acids 
1 7 and 18 in Figures 1-5 (SEQ ID N0S:2, 4, 6, 8, 10). Thereafter, the complete 
amino acid sequences were further analyzed by visual inspection, applying a 
simple form of the (-1,-3) rule of von Heine, von Heinje, ^wpm. Thus, the leader 
sequence for any of the nativeMAdCAM-l(a-e) proteins is predicted to consist 
ofamino acid residues 1-17 in Figures 1-5 (SEQIDN0S:2, 4, 6, 8, 10), while the 
predicted mature native MAdCAM-l(a-e) proteins begin at residue 1 8. 

As one of ordinary skill would appreciate, due to sequencing errors, the 
predicted leader sequence of the N4AdCAM-l(a-e) proteins of the present 
invention are predicted to be about 17 amino acids in length, but may be 
anywhere in the range of about 14 to about 22 amino acids. 

As one of ordinary skill would appreciate, due to the possibilities of 
sequencing errors discussed above, as well as the variability of cleavage sites for 
leaders in different known proteins, the predicted polypeptide corresponding to 
MAdCAM-l(a) comprises about 382 amino acids, but may be anywhere in the 
range of 368-396 amino acids. The predicted polypeptide corresponding to 
MAdCAM-l(b) comprises about 366 amino acids, but may be anywhere in the 
range of 348-382 amino acids. The predicted polypeptide corresponding to 
MAdCAM-l(c) comprises about 263 amino acids, but may be anywhere in the 
range of 250-276 amino acids. The predicted polypeptide corresponding to 
MAdCAM-l(d) comprises about 310 amino acids, but may be anywhere in the 
range of 294-325 amino acids. The predicted polypeptide corresponding to 
MAdCAM-l(e) comprises about 289 amino acids, but may be anywhere in the 
range of 275-304 amino acids. 

As indicated, nucleic acid moiecuies of the present invention may be in 
the form of RNA, such as mRNA, or in the form of DNA, including, for instance, 



wo 98/20110 



-16- 



PCT/US96/17S49 



cDNA and genomic DNA obtained by cloning or produced synthetically. The 
DNA may be double-stranded or single-stranded. Single-stranded DNA or RNA 
may be the coding strand, also known as the sense strand, or it may be the 
non-coding strand, also referred to as the anti-sense strand. 

By "isolated" nucleic acid molecule(s) is intended a nucleic acid molecule, 
DNA"or RNA, which has been removed from its native environment For 
example, recombinant DNA molecules contained in a vector are considered 
isolated for the purposes of the present invention. Further examples of isolated 
DNA molecules include recombinant DNA molecules maintained in heterologous 
host cells or purified (partially or substantially) DNA molecules in solution. 
Isolated RNA molecules include in vivo or in vitro RNA transcripts of the DNA 
molecules of the present invention. Isolated nucleic acid molecules according to 
the present invention fiirther include such molecules produced synthetically. 

Isolated nucleic acid molecules of the present invention include DNA 
molecules comprising an open reading frame (ORF) shown in FIGS. 1-5 (SEQ 
ID NOs: 1, 3, 5, 7, 9), respectively; a DNA molecule comprising the coding 
sequence for the mature MAdCAM-l(a) protein shown in FIG. 1 (last 365 amino 
acids) (SEQ ID N0:2); a DNA molecule comprising the coding sequence for the 
mature MAdCAM-l(b) protein shown in FIG. 2 (last 349 amino acids) (SEQ ID 
N0:4); a DNA molecule comprising the coding sequence for the mature 
MAdCAM-l(c) protein shown in FIG. 3 (last 246 amino acids) (SEQ ID N0:6); 
aDNA molecule comprising the coding sequence for the mature MAdCAM-l(d) 
protein shown in FIG. 4 (last 293 amino acids) (SEQ ID N0:8); and a DNA 
molecule comprising the coding sequence for the mature MAdCAM-l(e) protein 
shown in FIG. 5 (last 272 amino acids) (SEQ ID NO: 10). The invention also 
includes DNA molecules which comprise a sequence substantially different fix>m 
those described above but which, due to the degeneracy of the genetic code, still 
encode any of the MAdCAM-l(a-e) proteins. Of course, the genetic code is well 
known in the art. Thus, it would be routine for one skilled in the art to generate 
such degenerate variants. 
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The invention further provides an isolated nucleic acid molecule having 
the nucleotide sequence shown in FIGS. 1-6 (SEQ ID NOs:l, 3, 5, 7, 9, 33, 34, 
35, 36, 37, and 38, respectively), or a nucleic acid molecule having a sequence 
complementary to one of the above sequences. Such isolated molecules, 
particularly DNA molecules, are useful as probes for gene mapping, by in situ 
hybridization with chromosomes, and for detecting expression of the MAdCAM- 
l(a-e) genes in human tissue, for instance, by northern blot analysis. 

The present invention is further directed to fragments of the isolated 
nucleic acid molecules described herein. By a fragment of an isolated nucleic 
acid molecule having the nucleotide sequence of the nucleotide sequences shown 
in FIGS. 1-6 (SEQ IDNOs:l, 3, 5, 7, 9, 33, 34, 35, 36, 37, and 38, respectively), 
is intended fragments at least about 15 nt, and more preferably at least about 20 
nt, still more preferably at least about 30 nt, and even more preferably, at least 
about 40 nt in length which are usefril as diagnostic probes and primers as 
discussed herein. Of course, larger fragments 50- 1 1 50 nt in length are also usefiil 
according to the present invention as are fragments corresponding to most, if not 
all, of the nucleotide sequence shown in FIGS. 1-6 (SEQ ID NOs:l, 3, 5, 7, 9, 33, 
34, 35, 36, 37, and 38, respectively). By a fragment at least 20 nt in length, for 
example, is intended fragments which include 20 or more contiguous bases from 
the nucleotide sequence of the nucleotide sequences as shown in FIGS. 1-6 (SEQ 
ID NOs:l, 3, 5, 7, 9, 33, 34, 35, 36, 37, and 38, respectively). 

Preferred nucleic acid fragments of the present invention include nucleic 
acid molecules encoding epitope-bearing portions, or the transmembrane domain, 
or the extracellular domain, or the intracellular domain, of the MAdCAM-l(a-e) 
proteins. In particular, such nucleic acid fragments of the present invention 
include nucleic acid molecules encoding: a polypeptide comprising amino acid 
residues from about 52 to about 80 in FIG. 1 (SEQ ID N0:2); a polypeptide 
comprising amino acid residues from about 164 to about 196 in FIG. 1 (SEQ ID 
N0:2); and a polypeptide comprising amino acid residues from about 278 to 
about 321 in FIG. 1 (SEQ ID N0:2). (The inventors have determined that the 
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above polypeptide fragments are antigenic regions of the MAdCAM-l(a-e) 
proteins. Methods for determining other such epitope-bearing portions of the 
MAdCAM-l(a-e) proteins are described in detail below). Other preferred nucleic 
acid fragments include the genomic region 5' to the MAdCAM-1 gene 
(nucleotides residue 1 through 718 of SEQ ID NO:33), and fragments which 
correspond to exon 1 (nucleotide residues 1-52 of SEQ ID NO:34), exon 2 
(nucleotide residues 11-295 of SEQ ID NO:35), exon 3 (nucleotide residues 1 1- 
340 of SEQ ID NO:36), exon 4 (nucleotide residues 1 1-343 of SEQ ID NO:37), 
and exon 5 (nucleotide residues 11-608 of SEQ ID NO:38) all of which are 
shown in FIG. 6. Knowledge of the exon-intron boundaries (see FIG 6 and 
Example 6), which clearly mark functional domains in the molecule, will be 
helpfiil in designing variant forms of MAdCAM-1 for use in therapy (see below). 

In another aspect, the invention provides an isolated nucleic acid molecule 
comprising a polynucleotide which hybridizes under stringent hybridization 
conditions to a portion of the polynucleotide in a nucleic acid molecule of the 
invention described above. By "stringent hybridization conditions" is intended 
ovemight incubation at 42°C in a solution comprising: 50% formamide, 5x BSC 
(150 mM NaCl, 15mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5x 
Denhardt's solution, 10% dextran sulfate, and 20 g/ml denatured, sheared salmon 
sperm DNA, followed by washing the filters in O.lx SSC at about 65*'C. 

By a polynucleotide which hybridizes to a "portion" of a polynucleotide 
is intended a polynucleotide (either DNA or RNA) hybridizing to at least about 
1 5 nucleotides (nt), and more preferably at least about 20 nt, still more preferably 
at least about 30 nt, and even more preferably about 30-70 nt of tiie reference 
polynucleotide. These are useful as diagnostic probes and primers as discussed 
above and in more detail below. 

By a portion of a polynucleotide of "at least 20 nt in lengtii," for example, 
is intended 20 or more contiguous nucleotides from the nucleotide sequence of 
tiie reference polynucleotide (e.g., the nucleotide sequences as shown in FIGS. 
1-6 (SEQ ID NOs:l, 3, 5, 7, 9, 33, 34, 35, 36, 37, and 38. respectively)). 
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Of course, a polynucleotide which hybridizes only to a poly A sequence 
(such as the 3' terminal poly(A) tract of any of the MAdCAM-l(a-e) cDNAs 
shown in FIGS. 1-5 (SEQ ID NOs:l, 3, 5, 7, 9, respectively)), or to a 
complementary stretch of T (or U) resides, would not be included in a 
polynucleotide of the invention used to hybridize to a portion of a nucleic acid of 
the invention, since such a polynucleotide would hybridize to any nucleic acid 
molecule containing a poly (A) stretch or the complement thereof (e.g., 
practically any double-stranded cDNA clone). 

As indicated, nucleic acid molecules of the present invention which 
encode any of the MAdCAM-l(a-e) polypeptides may include, but are not limited 
to, those encoding the amino acid sequence of the mature polypeptides, by 
themselves; the coding sequence for the mature polypeptides and additional 
sequences, such as those encoding the about 1 7 amino acid leader or secretory 
sequence, such as a pre-, or pro- or prepro-protein sequence; the coding sequence 
of the mature polypeptide, with or without the aforementioned additional coding 
sequences, together with additional, non-coding sequences, including for 
example, but not limited to introns and non-coding 5' and 3' sequences, such as 
the transcribed, non-translated sequences that play a role in transcription, mRN A 
processing, including splicing and polyadenylation signals, for example - 
ribosome binding and stability of mRNA; an additional coding sequence which 
codes for additional amino acids, such as those which provide additional 
functionalities. Thus, the sequence encoding the polypeptide may be fused to a 
marker sequence, such as a sequence encoding a peptide which facilitates 
purification of the fused polypeptide. In certain preferred embodiments of this 
aspect of the invention, the marker amino acid sequence is a hexa-histidine 
peptide, such as the tag provided in a pQE vector (Qiagen, Inc.), among others, 
many of which are commercially available. As described in Gentz et al, Proc, 
Nail Acad ScL USA 5(5:821-824 (1989), for instance, hexa-histidine provides for 
convenient purification of die fusion protein. The "HA" tag is another peptide 
useful for purification which corresponds to an epitope derived from the influenza 
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hemagglutinin protein, which has been described by Wilson et al, Cell 27: 767 
(1984). As discussed below, other such fusion proteins include any of the 
MAdCAM-l(a-e) polypeptides fused to Fc at the N- or C-terminus. 

The present invention further relates to variants of the nucleic acid 
molecules of the present invention, which encode portions, analogs or derivatives 
of the MAdCAM-l(a-e) proteins. Variants may occur naturally, such as a natural 
allelic variant. By an "allelic variant" is intended one of several alternate forms 
of a gene occupying a given locus on a chromosome of an organism. Genes II, 
Lewin, B., ed., John Wiley & Sons, New York (1985). Non-naturally occuning 
variants may be produced using art-known mutagenesis techniques. 

Such variants include those produced by nucleotide substitutions, 
deletions or additions, which may involve one or more nucleotides. The variants 
may be altered in coding regions, non-coding regions, or both. Alterations in the 
coding regions may produce conservative or non-conservative amino acid 
substitutions, deletions or additions. Especially prefened among these are silent 
substitutions, additions and deletions, which do not alter the properties and 
activities of the MAdCAM-l(a-e) proteins or portions thereof. Also especially 
prefened in this regard are conservative substitutions. 

Further embodiments of the invention include isolated nucleic acid 
molecules comprising a polynucleotide having a nucleotide sequence at least 90% 
identical, and more preferably at least 95%, 96%, 97%, 98% or 99% identical to 
(a) a nucleotide sequences encoding the full-length MAdCAM-l(a-e) 
polypeptides having the complete amino acid sequence in FIGS. 1-5 (SEQ ID 
N0s:2, 4, 6, 8, 10, respectively), including the predicted leader sequence; (b) a 
nucleotide sequence encoding the mature MAdCAM-l(a-e) polypeptides 
(full-length polypeptide with the leader removed) having the amino acid 
sequences at positions 18-382 in FIG. 1 (SEQ ID N0:2), 18-366 in FIG. 2 (SEQ 
ID N0:4), 18-263 in FIG. 3 (SEQ ID N0:6), 18-310 in FIG. 4 (SEQ ID N0:8), 
or 18-290 in FIG. 5 (SEQ ID NO: 10); (c) a nucleotide sequence encoding a 
polypeptide comprising the transmembrane domain of any of the MAdCAM-1 
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polypeptides (MAdCAM-l(a-e)); (d) a nucleotide sequence encoding a 
polypeptide comprising the extracellular domain of any of the MAdCAM-1 
polypeptides (MAdCAM-l(a-e)); (e) a nucleotide sequence encoding a 
polypeptide comprising the intracellular domain of any of the MAdCAM-1 
polypeptides (MAdCAM-l(a-e)); (f) a nucleotide sequence comprising the 
MAdCAM-1 promoter, wherein the nucleotide sequence is given in SEQ ID 
NO:33; (g) a nucleotide sequence encoding exon 1, 2, 3, 4 or 5 of MAdCAM-1, 
having the sequence given in SEQ ID NOS:34, 35, 36, 37 and 38, respectively; 
and (h) a nucleotide sequence complementary to any of the nucleotide sequences 
in (a), (b), (c), (d), (e), (f) or (g), above. 

By a polynucleotide having a nucleotide sequence at least, for example, 
95% "identical" to a reference nucleotide sequence encoding any of the 
MAdCAM-l(a-e) polypeptides is intended that the nucleotide sequence of the 
polynucleotide is identical to the reference sequence except that the 
polynucleotide sequence may include up to five point mutations per each 100 
nucleotides of the reference nucleotide sequence encoding any of the MAdCAM- 
l(a-e) polypeptides. In other words, to obtain a polynucleotide having a 
nucleotide sequence at least 95% identical to a reference nucleotide sequence, up 
to 5% of the nucleotides in the reference sequence may be deleted or substituted 
with another nucleotide, or a number of nucleotides up to 5% of the total 
nucleotides in the reference sequence may be inserted into the reference sequence. 
These mutations of the reference sequence may occur at the 5' or 3' terminal 
positions of the reference nucleotide sequence or anywhere between those 
terminal positions, interspersed either individually among nucleotides in the 
reference sequence or in one or more contiguous groups within the reference 
sequence. 

As a practical matter, whether any particular nucleic acid molecule is at 
least 90%, 95%, 96%, 97%, 98% or 99% identical to, for instance, the nucleotide 
sequences shown in FIGS. 1-6 or to the nucleotides sequence of the deposited 
genomic clone, or to the deposited cDNA clone, can be determined 
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conventionally using known computer programs such as the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer 
Group, University Research Park, 575 Science Drive, Madison, WI 53711). 
Bestfit uses the local homology algorithm of Smith and Waterman, Advances in 
Applied Mathematics 2: 482-489 (1981), to find the best segment of homology 
between two sequences. When using Bestfit or any other sequence alignment 
program to determine whether a particular sequence is, for instance, 95% 
identical to a reference sequence according to the present invention, the 
parameters are set, of course, such that the percentage of identity is calculated 
over the full length of the reference nucleotide sequence and that gaps in 
homology of up to 5% of the total number of nucleotides in the reference 
sequence are allowed. 

The present ^plication is directed to nucleic acid molecules at least 90%, 
95%, 96%, 97%, 98% or 99% identical to the nucleic acid sequences shown in 
FIGS. 1-6 (SEQ ID NOs:l, 3, 5, 7, 9, 33, 34, 35, 36, 37, and 38, respectively), 
or to the nucleic acid sequence of the deposited genomic DNA, irrespective of 
whether they encode a polypeptide having the activity of any of 
MAdCAM-l(a-e). This is because even where a particular nucleic acid molecule 
does not encode a polypeptide having MAdCAM-l(a.e) activity, one of skill in 
the art would still know how to use the nucleic acid molecule, for instance, as a 
hybridization probe or a polymerase chain reaction (PGR) primer. Uses of the 
nucleic acid molecules of the present invention that do not encode a polypeptide 
having the activity of any of MAdCAM-l(a-e) include, inter alia, (1) isolating 
the gene encoding MAdCAM-l(a-e) or allelic variants thereof in a cDNA library; 
(2) in situ hybridization (e.g., "FISH") to metaphase chromosomal spreads to 
provide precise chromosomal location of the gene encoding MAdCAM-l(a-e), 
as described in Verma et ai. Human Chromosomes: A Manual of Basic 
Techniques, Pergamon Press, New York (1988); and Northern Blot analysis for 
detecting mRNA expression of any of MAdCAM-l(a-e) in specific tissues. 
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Preferred, however, are nucleic acid molecules having sequences at least 
90%, 95%, 96%, 97%, 98% or 99% identical to any of the nucleic acid sequences 
shown in FIGS. 1-6 (SEQ ID NOs:l, 3, 5, 7. 9, 33, 34, 35, 36, 37, and 38, 
respectively), or to the nucleic acid sequence of the deposited genomic DNA 
which does, in fact, encode a polypeptide having the protein activity of any of 
MAdCAM-l(a-e). By "a polypeptide having the protein activity of any of 
MAdCAM-l(a-e)" is intended polypeptides exhibiting activity similar, but not 
necessarily identical, to an activity of any of the MAdCAM-l(a-e) proteins of the 
invention (either the full-length proteins or, preferably, the mature proteins), as 
measured in a particular biological assay. For example, the protein activity of 
any of MAdCAM-l(a-e) can be measured by using a variation of the Stamper- 
Woodruff in vitro lymphocyte-endothelial cell binding assay (J. Exp. Med 144: 
828-833 (1976), which tests the ability of lymphoid cells expressing the a4p7 to 
bind to vascular endothelial cells expressing a polypeptide suspected of having 
the activity of any of the MAdCAM-l(a-e) proteins (Hanninen et al, J. Clin, 
Invest, 92: 2590-2515 (1993). Briefly, the assay involves contacting a cell which 
expresses a4^7 (such as TKl cells) and thus binds to cells expressing any of 
MAdCAM-l(a-e), with cells expressing any of the MAdCAM-l(a-e) molecules 
of the invention, and measuring the resultant adhesion between the two types of 
cells. Thus, a cell expressing the protein activity of any of MAdCAM-l(a-e) will 
bind to the cells expressing a4P7, while a cell expressing a protein which does not 
bind to 7 will be considered not to have the activity of any of 
MAdCAM-l(a-e). 

Of course, due to the degeneracy of the genetic code, one of ordinary skill 
in the art will immediately recognize that a large number of the nucleic acid 
molecules having a sequence at least 90%. 95%, 96%, 97%, 98%, or 99% 
identical to the nucleic acid sequences shown in FIGS. 1-5 (SEQ ID N0:1, 3, 5, 
7, 9, respectively) will encode a polypeptide "having the protein activity of any 
of ivIAdCAM-l(a-e)." In fact, since degenerate variants of these nucleotide 
sequences all encode the same polypeptide, this will be clear to the skilled artisan 
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even without perfoming the above described comparison assay. It will be fUrther 
recognized in the art that, for such nucleic acid molecules that are not degenerate 
variants, a reasonable number will also encode a polypeptide having the protein 
activity of any of MAdCAM-l(a-e). This is because the skilled artisan is fully 
aware of amino acid substitutions that are either less likely or not likely to 
significantly effect protein fiinction (e.g., replacing one aliphatic amino acid with 
a second aliphatic amino acid). 

For example, guidance concerning how to make phenotypically silent 
amino acid substitutions is provided in Bowie, J. U. et al, "Deciphering the 
Message in Protein Sequences: Tolerance to Amino Acid Substitutions," Science 
2-^7.1306-1310 (1990), wherein the authors indicate that proteins are surprisingly 
tolerant of amino acid substitutions. 



Vectors and Host CeUs 



The present invention also relates to vectors which include the isolated 
DNA molecules of the present invention, host cells which are genetically 
engineered with the recombinant vectors, and the production of any of the 
MAdCAM-l(a-e) polypeptides or fragments thereof by recombinant techniques. 

The polynucleotides may be joined to a vector containing a selectable 
marker for propagation in a host. Generally, a plasmid vector is introduced in a 
precipitate, such as a calcium phosphate precipitate, or in a complex with a 
charged lipid. If the vector is a virus, it may be packaged in vitro using an 
appropriate packaging cell line and then transduced into host cells. 

The DNA insert should be operatively linked to an appropriate promoter, 
such as the phage lambda PL promoter, the E. coll lac. trp and tac promoters, the 
SV40 early and late promoters and promoters of retroviral LTRs, to name a few. 
Other suitable promoters will be known to the skilled artisan. The expression 
constructs will further contain sites for transcription initiation, termination and, 
in the transcribed region, a ribosome binding site for translation. The coding 
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portion of the mature transcripts expressed by the constructs will preferably 
include a translation initiating at the beginning and a termination codon (UAA, 
UGA or UAG) appropriately positioned at the end of the polypeptide to be 
translated. 

As indicated, the expression vectors will preferably include at least one 
selectable marker. Such markers include dihydrofolate reductase or neomycin 
resistance for eukaryotic cell culture and tetracycline or ampicillin resistance 
genes for culturing in E. coli and other bacteria. Representative examples of 
appropriate hosts include, but are not limited to, bacterial cells, such as E. coli, 
Streptomyces and Salmonella typhimurium cells; fungal cells, such as yeast cells; 
insect cells such as Drosophila S2 and Spodoptera Sf9 cells; animal cells such as 
CHO, COS and Bowes melanoma cells; and plant cells. Appropriate culture 
mediums and conditions for the above-described host cells are known in the art. 

Among vectors preferred for use in bacteria include pQE70, pQE60 and 
pQE-9, available from Qiagen; pBS vectors, Phagescript vectors, Bluescript 
vectors, pNH8A, pNH16a, pNHlSA, pNH46A, available from Stratagene; and 
ptrc99a, pKBC223-3, pKK233-3, pDR540, pRIT5 available from Pharmacia. 
Among preferred eukaryotic vectors are pWLNEO, pSV2CAT, pOG44, pXTl 
and pSG available from Stratagene; and pSVK3, pBPV, pMSG and pSVL 
available from Pharmacia. Other suitable vectors will be readily apparent to the 
skilled artisan. 

Introduction of the construct into the host cell can be effected by calcium 
phosphate transfection, DEAE-dextran mediated transfection, cationic 
lipid-mediated transfection, electroporation, transduction, infection or other 
methods. Such methods are described in many standard laboratory manuals, such 
as Davis et al, Basic Methods In Molecular Biology (1986). 

The polypeptide may be expressed in a modified form, such as a fusion 
protein, and may include not only secretion signals, but also additional 
heterologous mnciional regions. For instance, a region of additional amino acids, 
particularly charged amino acids, may be added to the N-terminus of the 
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polypeptide to improve stability and persistence in the host cell, during 
purification, or during subsequent handling and storage. Also, peptide moieties 
may be added to the polypeptide to facilitate purification. Such regions may be 
removed prior to final preparation of the polypeptide. The addition of peptide 
moieties to polypeptides to engender secretion or excretion, to improve stability 
and to facilitate purification, among others, are familiar and routine techniques 
in the art. A preferred fusion protein comprises a heterologous region fi:om 
immunoglobulin that is useful to solubilize proteins. For example, EP-A-0 464 
533 (Canadian counterpart 2045869) discloses fusion proteins comprising various 
portions of constant region of immunoglobulin molecules together with another 
human protein or part thereof In many cases, the Fc part in a fusion protein is 
thoroughly advantageous for use in therapy and diagnosis and thus results, for 
example, in improved pharmacokinetic properties (EP-A 0232 262). On the other 
hand, for some uses it would be desirable to be able to delete the Fc part after the 
fusion protein has been expressed, detected and purified in the advantageous 
manner described. This is the case when Fc portion proves to be a hindrance to 
use in therapy and diagnosis, for example when the fiision protein is to be used 
as antigen for immunizations. In drug discovery, for example, human proteins, 
such as, hIL5- has been fused with Fc portions for the purpose of high-throughput 
screening assays to identify antagonists of hIL-5. See, D. Bennett et al.. Journal 
of Molecular Recognition, Vol. 8 52-58 (1995) and K. Johanson et al., 7%e 
Journal of Biological Chemistry, Vol. 270, No. 16, pp 9459-9471 (1995). 

The MAdCAM-l(a-e) proteins can be recovered and purified from 
recombinant cell cultures by well-known methods including ammonium sulfate 
or ethanol precipitation, acid extraction, anion or cation exchange 
chromatography, phosphocellulose chromatography, hydrophobic interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography and 
lectin chromatography. Most preferably, high performance liquid 
chromatography ("HPLC") is employed for purification. Polypeptides of the 
present invention include naturally purified products, products of chemical 
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synthetic procedures, and products produced by recombinant techniques from a 
prokaryotic or eukaryotic host, including, for example, bacterial, yeast, higher 
plant, insect and mammalian cells. Depending upon the host employed in a 
recombinant production procedure, the polypeptides of the present invention may 
be glycosylated or may be non-glycosylated. In addition, polypeptides of the 
invention may also include an initial modified methionine residue, in some cases 
as a result of host-mediated processes, 

MAdCAM'l(a-e) Polypeptides and Fragments 

The invention further provides isolated MAdCAM-l(a-e) polypeptides 
having the amino acid sequence given in FIG. 1-5 (SEQ ID N0:2, 4, 6, 8, 10, 
respectively), or a peptide or polypeptide comprising a portion of the above 
polypeptides, as well as any of the polypeptides encoded by the nucleotide 
sequence of exons 1-5 of FIG 6 (SEQ ID NOS:34-38), 

It will be recognized in the art that some amino acid sequences of the 
MAdCAM-l(a-e) polypeptides can be varied without significant effect of the 
structure or function of the protein. If such differences in sequence are 
contemplated, it should be remembered that there will be critical areas on the 
protein which determine activity. 

Thus, the invention further includes variations of the MAdCAM-l(a-e) 
polypeptides which show substantial MAdCAM-l(a-e) polypeptide activity or 
which include regions of any of the MAdCAM-l(a-e) proteins such as the protein 
portions discussed below. Such mutants include deletions, insertions, inversions, 
repeats, and type substitutions. As indicated above, further guidance concerning 
which amino acid changes are likely to be phenotypically silent can be found in 
Bowie, J.U., et aL, "Deciphering the Message in Protein Sequences: Tolerance 
to Amino Acid Substitutions," Science 2-^7;1306-1310 (1990). 

Thus, the fragment, derivative or analog of the polypeptide shown in 
FIGS 1-5 (SEQ ID NOS: 2, 4, 6 ,8 ,10) may be (i) one in which one or more of 
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the amino acid residues are substituted with a conserved or non-conserved amino 
acid residue (preferably a conserved amino acid residue) and such substituted 
ammo acid residue may or may not be one encoded by the genetic code, or (ii) 
one in which one or more of the amino acid residues includes a substituent group, 
or (iii) one in which the mature polypeptide is fused with another compound, such 
as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or (iv) one in which the additional amino acids are fused to 
the mature polypeptide, such as an IgG Fc fusion region peptide or leader or 
secretory sequence or a sequence which is employed for purification of the 
mature polypeptide or a proprotein sequence. Such fragments, derivatives and 
analogs are deemed to be within the scope of those skilled in the art from the 
teachmgs herein. 

Of particular interest are substitutions of charged amino acids with 
another charged amino acid and with neutral or negatively charged amino acids. 
The latter resuhs in proteins with reduced positive charge to. improve the 
characteristics of the MAdCAM-l(a-e) proteins. The prevention of aggregation 
is highly desirable. Aggregation of proteins not only results in a loss of activity 
but can also be problematic when preparing pharmaceutical formulations, because 
they can be immunogenic. (Pinckard et al, Clin Exp, Immunol 2:331-340 
(1967); Robbins et aL Diabetes 56:838-845 (1987); Cleland et al. Crit, Rev. 
Therapeutic Drug Carrier Systems 70:307-377 (1993)). 

As indicated, changes are preferably of a minor nature, such as 
conservative amino acid substitutions that do not significantly affect the folding 
or activity of the protein (see Table 1). 
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TABLE 1. Conservative Amino Acid Substitutions. 



Aromatic 


Phenylalanine 




Tryptophan 




Tyrosine 


Hydrophobic 


Leucine 




Isoleucine 




Valine 


Polar 


Glutamine 




Asparagine 


Basic 


Arginine 




Lysine 




Histidine 


Acidic 


Aspartic Acid 




Glutamic Acid 


Small 


Alanine 




Serine 




Threonine 




Methionine 




Glycine 



Amino acids in the MAdCAM-l(a-e) polypeptides of the present 
invention that are essential for function can be identified by methods known in 
the art, such as site-directed mutagenesis or alanine-scanning mutagenesis 
(Cunningham and Wells, Science 2^4:1081-1085 (1989)). The latter procedure 
introduces single alanine mutations at every residue in the molecule. The 
resulting mutant molecules are then tested for biological activity such as receptor 
binding or in vitro, or in vitro proliferative activity. Sites that are critical for 
protein activity can also be determined by structural analysis such as 
crystallization, nuclear magnetic resonance or photoafFmity labeling (Smith et al, 
J. Mol Biol 22-/:899-904 (1992) and deVos etal S'c/e/ice 255:306-312 (1992)). 

The polypeptides of the present invention are preferably provided in an 
isolated form, and preferably are substantially purified. A recornbinantiy 
produced version of any of the MAdCAM-l(a-e) polypeptides can be 
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substantially purified by the one-step method described in Smith and Johnson, 
Gene (57;3 1-40 (1988). 

The polypeptides of the present invention include any of the polypeptides 
of FIGS. 1-5(SEQ ID N0S:2, 4, 6, 8, 10, respectively) including the leader, any 
of the mature polypeptides of FIGS. 1-5 (SEQ ID N0S:2, 4, 6, 8, 10, 
respectively) minus the leader (i.e., the mature protein), any of the polypeptides 
of FIGS. 1-5(SEQ ID N0S:2, 4, 6, 8, 10, respectively) minus the leader, the 
extracellular domain of any of the polypeptides of FIGS. 1-5(SEQ ID N0S:2, 4, 
6, 8, 10, respectively), the intracellular domain of any of the polypeptides of 
FIGS. 1-5(SEQ ID N0S:2, 4, 6, 8, 10, respectively), and the transmembrane 
domain of any of the polypeptides of FIGS. 1-5(SEQ ID N0S:2, 4, 6, 8, 10, 
respectively), as well as any of the polypeptides encoded by the nucleotide 
sequence of exons 1-5 of FIG 6 (SEQ ID NOS:34-38). Of course, those of 
ordinary skill will understand that, just as the splicing variants MAdCAM-Ua-e) 
are generated in vivo by alternative spUcing of the 5 exons shown in FIG. 6 (SEQ 
ID NOS:34-38) (as well as by splicing internal to those exons. see Example 6), 
polypeptide variants of MAdCAM-1 can be recombinantly prepared by 
combining exons, or portions of exons, of the sequences shown in FIG. 6 (SEQ 
ID NOS:34-38). Such polypeptides are also included in the invention. Also 
included are polypeptides which are at least 80% identical, more preferably at 
least 90% or 95% identical, still more preferably at least 96%, 97%, 98% or 99% 
identical to the above-mentioned polypeptides, and also include portions of such 
polypeptides with at least 30 amino acids and more preferably at least 50 amino 
acids. 

By a polypeptide having an amino acid sequence at least, for example, 
95% "identical" to a reference amino acid sequence of any of the 
MAdCAM-l(a-e) polypeptides is intended that the amino acid sequence of the 
polypeptide is identical to the reference sequence except that the polypeptide 
sequence may include up to five amino acid alterations per each 100 amino acids 
of the reference amino acid of any of the MAdCAM-l(a-e) polypeptides. In 
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other words, to obtain a polypeptide having an amino acid sequence at least 95% 
identical to a reference amino acid sequence, up to S% of the amino acid residues 
in the reference sequence may be deleted or substituted with another amino acid, 
or a number of amino acids up to 5% of the total amino acid residues in the 
reference sequence may be inserted into the reference sequence. These alterations 
of the reference sequence may occur at the amino or cairboxy terminal positions 
of the reference amino acid sequence or anywhere between those terminal 
positions, interspersed either individually among residues in the reference 
sequence or in one or more contiguous groups within the reference sequence. 

As a practical matter, whether any particular polypeptide is at least 90%, 
95%, 96%, 97%, 98% or 99% identical to^ for instance, any of the amino acid 
sequences shown in FIGS. 1-6 (SEQ ID N0s:2, 4, 6, 8, 10, respectively), or to the 
amino acid sequence encoded by deposited genomic DNA, can be determined 
conventionally using known computer programs such the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer 
Group, University Research Park, 575 Science Drive, Madison, WI 53711). 
When using Bestfit or any other sequence alignment program to determine 
whether a particular sequence is, for instance, 95% identical to a reference 
sequence according to the present invention, the parameters are set, of course, 
such that the percentage of identity is calculated over the full length of the 
reference amino acid sequence and that gaps in homology of up to 5% of the total 
number of amino acid residues in the reference sequence are allowed. 

The polypeptide of the present invention could be used as a molecular 
weight marker on SDS-PAGE gels or on molecular sieve gel filtration columns 
using methods well known to those of skill in the art. 

In another aspect, the invention provides a peptide or polypeptide 
comprising an epitope-bearing portion of the invention described hererin. The 
epitope of this polypeptide portion is an immunogenic or antigenic epitope of a 
polypeptide of the invention. An "immunogenic epitope ' is defmed as a part of 
a protein that elicits an antibody response when the whole protein is the 
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immimogen. On the other hand, a region of a protein molecule to which an 
antibody can bind is defined as an '^antigenic epitope." The number of 
immunogenic epitopes of a protein generally is less than the number of antigenic 
epitopes. See, for instance, Geysen et aL Proc. Natl. Acad Sci, USA 57:3998- 
4002(1983). 

As to the selection of peptides or polypeptides bearing an antigenic 
epitope (i.e., that contain a region of a protem molecule to which an antibody can 
bind), it is well known in that art that relatively short synthetic peptides that 
mimic part of a protein sequence are routinely capable of eliciting an antiserum 
that reacts with the partially mimicked protein. See, for instance, Sutcliffe, J. G., 
Shinnick, T. M., Green, N. and Learner, R.A. (1983) Antibodies that react with 
predetermined sites on proteins. Science 279;660-666. Peptides capable of 
eliciting protein-reactive sera are frequently represented in the primary sequence 
of a protein, can be characterized by a set of simple chemical rules, and are 
confined neither to inununodominant regions of intact proteins (i.e., 
immunogenic epitopes) nor to the amino or carboxyl terminals. 

Antigenic epitope-bearing peptides and polypeptides of the invention are 
therefore useful to raise antibodies, including monoclonal antibodies, that bind 
specifically to a polypeptide of the invention. See, for instance, Wilson et al, 
Cell 37:161-11% (1984) at 777. 

Antigenic epitope-bearing peptides and polypeptides of the invention 
preferably contain a sequence of at least seven, more preferably at least nine and 
most preferably between about at least about 15 to about 30 amino acids 
contained vsdthin the amino acid sequence of a polypeptide of the invention. 

Non-limiting examples of antigenic polypeptides or peptides that can be 
used to generate antibodies specific to any of the MAdCAM-l(a-e) polypeptides 
include: a polypeptide comprising amino acid residues from about 52 to about 80 
in FIG. 1 (SEQ ID N0:2); a polypeptide comprising amino acid residues from 
about 1 64 to about 196 in FIG. 1 (SEQ ID N0:2); and a polypeptide comprising 
amino acid residues from about 228 to about 321 in FIG. 1 (SEQ ID N0:2). As 
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indicated above, the inventors have determined that the above polypeptide 
fragments are antigenic regions of the endokine alpha protein. 

The epitope-bearing peptides and polypeptides of the invention may be 
produced by any conventional means. Houghten, R. A. (1985) General method 
for the rapid solid-phase synthesis of large numbers of peptides: specificity of 
antigen-antibody interaction at the level of individual amino acids. Proc. Natl 
Acad. Scl USA 52:5131-5135. This "Simultaneous Multiple Peptide Synthesis 
(SMPSy process is further described in U.S. Patent No. 4,63 1,21 1 to Houghten 
et al.(1986). 

MAdCAM'l Related Disorder Diagnosis 

Under circumstances which induce an inflammatory response, circulating 
lymphocytes expressing a receptor for one or more of the MAdCAM-1 proteins 
(MAdCAM-l(a-e)) are believed to bind to the MAdCAM-1 protein on mucosal 
venules, and then migrate through the venules to the epithelium, where acute 
inflammation results. Therefore, the invention also relates to the diagnosis of a 
pathological inflammatory condition by identifying the presence of an enhanced 
level of one or more of the MAdCAM-l(a-e) proteins or mRNA encoding these 
proteins, as compared to a corresponding "standard" mammal i.e., a mammal of 
the same species not having the pathological inflammatory condition. Such 
conditions include transplantation rejection, arthritis, rheumatoid arthritis, 
infection, dermatosis, inflammatory bowel disease, and autommune disease, 
including chronic relapsing experimental autoimmune encephalitis (EAE). 

~ It is also believed that certain tissues in mammals with cancer express 
significantly enhanced levels of one or more of the MAdCAM-l(a-e) proteins and 
mRNA encoding these proteins when compared to a corresponding "standard" 
mammal, i.e., a mammal of the same species not having the cancer. Further, it 
is beiieved that enhanced levels of any of the MAdCAM-i(a-e) proteins can be 
detected in certain body fluids (e.g., sera, plasma, urine, and spinal fluid) from 
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mammals with cancer when compared to sera from mammals of the same species 
not having the cancer. Thus, the invention provides a diagnostic method useful 
during tumor diagnosis, which involves assaying the expression level of the gene 
encoding any of the MAdCAM-l(a-e) proteins in mammalian cells or body fluid 
and comparing the gene expression level with a standard expression level for that 
same gene, whereby an increase in the gene expression level over the standard is 
indicative of certain tumors. 

Where a tumor diagnosis has already been made according to 
conventional methods, the present invention is useful as a prognostic indicator, 
whereby patients exhibiting enhanced expression of any of the MAdCAM-l(a-e) 
genes will experience a worse clinical outcome relative to patients expressing the 
relevant gene at a lower level. 

By "assaying the expression level of the gene encoding one or more of the 
MAdCAM-l(a-e) proteins" is intended qualitatively or quantitatively measuring 
or estimating the level of one or more of the MAdCAM-l(a-e) proteins or the 
level of the mRNA encoding one or more of the MAdCAM-l(a-e) proteins in a 
first biological sample either directly (e.g., by determining or estimating absolute 
protein level or mRNA level) or relatively (e.g., by comparing to the protein level 
or mRNA level of the same MAdCAM-l(a-e)in a second biological sample). 

Preferably, the level of the MAdCAM-l(a-e) protein or mRNA level in 
the first biological sample is measured or estimated and compared to a standard 
protein level or mRNA level for the same protein, the standard being taken from 
a second biological sample obtained from an individual not having the cancer. 
As will be appreciated in the art, once a standard protein level or mRNA level for 
one or more of MAdCAM-l(a-e) is known, it can be used repeatedly as a 
standard for comparison. 

By "biological sample" is intended any biological sample obtained from 
an individual, cell line, tissue culture, or other source which contains one or more 
of the MAdCAM-l(a-e)proteins or the mRNA encoding them. Biological 
samples include mammalian body fluids (such as sera, plasma, urine, synovial 
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fluid and spinal fluid) which contain a secreted mature protein, and ovarian, 
prostate, heart, placenta, pancreas liver, spleen, lung, breast and umbilical tissue. 

The present invention is useful for detecting cancer in mammals. In 
particular the invention is useful during diagnosis of the of following types of 
cancers in mammals: lymphoma, leukemia, and metastatic tumors. Preferred 
mammals include monkeys, apes, cats, dogs, cows, pigs, horses, rabbits and 
humans. Particularly preferred are humans. 

Total cellular RNA can be isolated from a biological sample using the 
single-step guanidinium-thiocyanate-phenol-chioroform method described in 
ChomczynskiandSacchi,^^/. B/oc/zew. 752; 156- 159 (1987). Levels of mRNA 
encoding any of the MAdCAM-l(a-e) proteins are then assayed using any 
appropriate method. These include Northern blot analysis, SI nuclease mapping, 
the polymerase chain reaction (PCR). reverse transcription in combination with 
the polymerase chain reaction (RT-PCR), and reverse transcription in 
combination with the ligase chain reaction (RT-LCR), 

Assaying protein levels of any of MAdCAM-l(a-e) in a biological sample 
can occur using antibody-based techniques. For example, expression of any of 
the MAdCAM-l(a-e) polypeptides in tissues can be studied with classical 
immunohistological methods. (Jalkanen, M., et al, J. Cell. Biol. 70/;976-985 
(1985); Jalkanen, M., et aL J- Cell . Biol 705:3087-3096 (1987)). 

Other antibody-based methods useful for detecting MAdCAM-l(a-e) 
protein gene expression include immunoassays, such as the enzyme linked 
immunosorbent assay (ELISA) and the radioimmunoassay (RIA). Suitable 
labels are known in the art, and include enzyme labels, such as glucose oxidase, 
and radioisotopes, such as iodme *^4), carbon ('^C), sulfur (^^S), tritium (^H), 
indium (*'"In), and technetium (''"'Tc), and fluorescent labels, such as fluorescein 
and rhodamine, and biotin. 



wo 98/20110 



-36- 



PCT/US96/17549 



Chromosome Assays 

The nucleic acid molecules of the present invention are also valuable for 
chromosome identification. The sequence is specifically targeted to and can 
hybridize with human chromosome 19pl3.3. The mapping of DNAs to 
chromosomes according to the present invention is an important first step in 
correlating those sequences with genes associated with disease. 

In certain preferred embodiments in this regard, the cDNA herein 
disclosed is used to clone genomic DNA of any of the genes encoding 
MAdCAM-l(a-e) proteins. This can be accomplished using a variety of well 
known techniques and libraries, which generally are available commercially. The 
genomic DNA then is used for in situ chromosome mapping using well known 
techniques for this purpose. 

In addition, in some cases, sequences can be mapped to chromosomes by 
preparing PGR primers (preferably 15-25 bp) fi-om the cDNA. Computer analysis 
of the 3 ' untranslated region of the gene is used to rapidly select primers that do 
not span more than one exon in the genomic DNA, thus complicating the 
amplification process. These primers are then used for PGR screening of somatic 
cell hybrids containing individual human chromosomes. 

Fluorescence in situ hybridization ("FISH") of a cDNA clone to a 
metaphase chromosomal spread can be used to provide a precise chromosomal 
location in one step. This technique can be used with probes from the cDNA as 
short as 50 or 60 bp. For a review of this technique, see Verma et aL, Human 
Chromosomes: A Manual Of Basic Techniques, Pergamon Press, New York 
(1988). 

Once a sequence has been mapped to a precise chromosomal location, the 
physical position of the sequence on the chromosome can be correlated with 
genetic map data. Such data are found, for example, in V. McKusick, Mendelian 
Inheritance In Man, available on-line through Johns Hopkins University, Welch 
Medical Library. The relationship between genes and diseases that have been 
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mapped to the same chromosomal region are then identified through linkage 
analysis (coinheritance of physically adjacent genes). 

Next, it is necessary to determine the differences in the cDNA or genomic 
sequence between affected and unaffected individuals. If a mutation is observed 
in some or all of the affected individuals but not in any normal individuals, then 
the mutation is likely to be the causative agent of the disease. 

MAdCAM-1 Protein and Antibody Therapy 

Under circumstances which induce an inflammatory response, circulating 
lymphocytes are believed to express a receptor for one or more of the MAdC AM- 
1 proteins (MAdCAM-l(a-e)), bind to the MAdC AM- 1 protein on mucosal 
venules via this receptor, and then migrate through the venules to the epithelium, 
where acute inflammation results. Therefore, the administration of a therapeutic 
composition capable of blocking the migration of leukocytes via MAdC AM- 1 
polypeptides (MAdCAM-l(a-e)) {i.e., an antagonist of the activity of any of 
MAdCAM-l(a-e)) could be an effective therapeutic treatment for minimizing 
tissue damage in many abnormal inflanunatory conditions, especially where the 
inflammation is chronic or acute. Such conditions include transplantation 
rejection, arthritis, rheumatoid arthritis, infection, dermatosis, inflammatory 
bowel disease, and autonmiune disease, including chronic relapsing experimental 
autoimmune encephalitis (EAE). 

Thus, the invention also relates to a therapeutic method for treating an 
individual in need of a reduction in the activity of any of MAdCAM-l(a-e) by 
administering to the individual a therapeutically effective amount of a 
composition comprising an antagonist of MAdCAM-l(a-e) activity. Such 
compounds include anti-MAdCAM-1 antibodies or fragments thereof, as well as 
compounds such as solubilized a^^i . Such individuals can include those 
suffering from abnormal inflammatory conditions, especially where the 
inflammation is chronic or acute. The invention also includes using such 
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compositions as a "preventative" treatment before detection of an inflammatory 
state, so as to prevent the development of inflammation in a patient at high risk 
for the same, such as, for example, transplant patients. 

Therefore, the invention is further dkected to antibody-based therapies 
which involve administering an antibody directed against any of MAdCAM-l(a- 
e), to a mammalian, preferably human, patient for treating one or more of the 
above-described disorders. Methods for producing such anti-MAdCAM-1 
polyclonal and monoclonal antibodies are described in detail above. Such 
antibodies may be provided in pharmaceutically acceptable compositions as 
known in the art or as described herein, 

A summary of the ways in which the antibodies of the present invention 
may be used therapeutically includes binding any of the MAdCAM-l(a-e) 
polypeptides locally or systemicaily in the body. Some of these approaches are 
described in more detail below. Armed with the teachings provided herein, one 
of ordinary skill in the art will know how to use the antibodies of the present 
invention for diagnostic, monitoring or therapeutic purposes without undue 
experimentation. 

The antagonists of MAdCAM-l(a-e) activity of the invention may also 
include soluble forms of any of the MAdCAM-l(a-e) polypeptides. The 
administration of soluble forms of any of the MAdCAM-1 (a-e) polypeptides may 
block leukocyte adhesion to endothelium at sites of inflammation. Those of skill 
in the art will readily know how to generate such soluble fragments based on an 
analysis of the MAdCAM-1 three dimensional structure such as that given in 
FIG. 7. 

Modes of administration 

It will be appreciated that conditions caused by an increase in the standard 
or normal level of activity of any of MAdCAM-l(a-e) in an individual, can be 
treated by administration of a molecule capable of blocking lymphocyte adhesion 
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that is mediated by any of MAdCAM-1 (a-e). Thus* the invention further provides 
a method of treating an individual in need of a decreased level of MAdCAM-l(a- 
e)-mediated adhesion comprising administering to such an individual a 
pharmaceutical composition comprising an effective amount of antagonist of any 
of the MAdCAM-1 (a-e) polypeptides of the invention. Such antagonists include 
anti-MAdCAM-1 antibodies or fragments or derivatives thereof, as well as 
compounds such as solubilized a4p7, or soluble forms of any of MAdCAM-l(a- 
e), which are effective to decrease the activity level of the desired MAdCAM- 
l(a-e) protein in such an individual. 

As a general proposition, the total pharmaceutically effective amount of 
one or more of the antagonists, including antibodies, soluble forms of a^py, and 
soluble forms of the MAdCAM-l(a-e) polypeptides, administered parenterally 
per dose will be in the range of about 1 |ig/kg/day to 10 mg/kg/day of patient 
body weight, although, as noted above, this will be subject to therapeutic 
discretion. More preferably, this dose is at least 0.01 mg/kg/day, and most 
preferably for humans between about 0.01 and 1 mg/kg/day for the hormone. If 
given continuously, the desired antagonist of the MAdCAM-l(a-e) polypeptides 
is typically administered at a dose rate of about 1 ^g/kg/hour to about 50 
tag/kg/hour, either by 1-4 injections per day or by continuous subcutaneous 
infusions, for example, using a mini-pump. An intravenous bag solution may 
also be employed. . 

Pharmaceutical compositions containing one or more of the antagonists 
of the MAdCAM-l(a-e) polypeptides of the invention may be administered 
orally, rectally, parenterally, intracistemally, intravaginally,. intraperitoneally, 
topically (as by powders, ointments, drops or transdermal patch), bucally, or as 
an oral or nasal spray. By "pharmaceutically acceptable carrier" is meant a non- 
toxic solid, semisolid or liquid filler, diluent, encapsulating material or 
formulation auxiliary of any type. The term "parenteral" as used herein refers to 
modes of administration which include intravenous, intramuscular. 
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intraperitoneal, intrastemal, subcutaneous and intraarticular injection and 
infusion. 

Where the antagonist to be used is an antibody, fragment thereof, or 
derivative thereof, it is preferred to use high affinity and/or potent in vivo 
MAdCAM-1 -inhibiting and/or neutralizing antibodies, fragments or regions 
thereof, for both MAdCAM-1 immunoassays (see the section of this application 
directed to diagnostics) and therapy of endokine related disorders. Such 
antibodies, fragments, or regions, will preferably have an affmity for any of 
human MAdCAM-l(a-e), expressed as Ka, of at least 10^ M"', more preferably, 
at least 10^ M'K such as 5 X 10« M-\ 8 X 10« M:\ 2XW M\ 4X10^ M', 6 X 
10' M-\8X10' M"'. 

Preferred for human therapeutic use are high affinity murine and 
murine/human or human/human chimeric antibodies, and fragments, regions and 
derivatives having potent in vivo MAdC AM- 1 -inhibiting and/or neutralizing 
activity, according to the present invention, e.g.. that block MAdCAM-1- 
mediated cell adhesion activity, in vivo, in situ, and in vitro. 

Selection of Compounds Capable of Regulating Expression ofMAdCAM-1 

As the invention also includes isolated genomic DNA molecules 
comprising the 5* flanking region of MAdC AM-1 (a-e), including the promoter for 
these splice variants, yet another aspect of the invention is related to a method for 
identifying compounds capable of enhancing or inhibiting expression of any of 
MAdCAM-l(a-e). In order to determine the effect of such compounds, reporter 
plasmids are constructed by linking a portion of the DNA located 5' to the 
transcription start site of any of MAdCAM-l(a-e) in front of a reporter gene. 
Such constructs are then transfected into appropriate cell lines. Compounds that 
are to be tested for their ability to increase or decrease expression from the 
MAdCAM-1 promoter are then administered to the cell bearing the reporter 
construct, and the effect of each compound on reporter gene expression is 
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determined by comparing that level of expression to the expression level in a 
control cell bearing the reporter construct, where the test compound has not been 
administered to the control cell. 

The DNA sequence of the 5* flanking region of the MAdCAM-1 gene is 
shown in Figure 6 (SEQ ID NO:33). For a full description of this region, see 
Example 6, below. Of course, since the nucleotide sequence is known, routine 
methods are available for producing such nucleic acid molecules synthetically 
(see, for example. Synthesis and Application of DNA and RNA, S. A. Narang, 
ed., 1987. Academic Press, San Diego, CA). Alternatively, such isolated 
nucleic acid molecules of the present invention can be generated as follows. 
The MAdCAM-1 gene promoter region is obtained by amplification using the 
polymerase chain reaction (PGR). The amplified fragment is then inserted mto 
an appropriate plamid (such as, for example, pCAT ™ (Promega, Madison, 
WI)). Nested deletion plasmids are then generated using the commercially 
available "Erase-a-Base" System (Promega, Madison, WI) as described in 
Henikoff, Gene 25:351-359 (1984)). Thus, only routine experimentation would 
be required to generate any of the isolated nucleic acid molecules of the present 
invention which are capable of enhancing or inhibiting gene expression. 

The nucleic acid molecules of the present invention can include the 
MAdCAM-1 promoter and d^-acting enhancer and/or silencer elements 
capable of affectmg gene transcription. For simplicity, these isolated nucleic 
acid molecules of the present invention are referred to below as "MAdCAM-1 
transcriptional regulatory elements" or "transcriptional elements." As 
indicated, to determine the effect of a transcriptional element of the present 
invention on gene expression, nested deletion reporter plasmids can be 
generated containing a transcriptional element of the present invention linked 
in front of the chloramphenicol acetyltransferase (CAT) reporter gene. Such 
recombinant DNA molecules of the present invention actually generated by the 
inveniors include transcriptional elements inserted, in both orientations, into the 
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Xbal site of pBLCAT2 vector (Luckow, B., SchUtz, G., Nucleic Acids Res. 
75:5490(1987)). 

By the invention, a recombinant DNA molecule containing a 
transcriptional element of the present invention is used to transiently transfect 
an appropriate cell line such as, for example, human choriocarcinoma ceU lines 
(JEG-3 and JAR), the human prostate carcinoma cell line PC-3, or the monkey 
kidney cell line CV-1, all of which are availabe form the American Type 
Culture Collection. In addition to using the CAT system for reporter gene 
analyses, the hGH transient expression system can also be used (Selden et al, 
Mol. Cell Biol. 6:3173-3179 (1986)) or other systems that are based on the 
expression of p-galactosidase (An rtfl/.. Mol. Cell. Biol. 2:1628-1632 (1982)) 
and xanthine-guanine phosphoribosyl transferase (Chu et al.. Nucleic Acids 
Res. 75:2921-2930 (1985)). 

A transcriptional element of the present invention may be inserted into 
an appropriate vector in accordance with conventional techniques, including 
blunt-ending or staggered-ending termini for ligation, restriction enzyme 
digestion to provide appropriate termini, filling in of cohesive ends as 
appropriate, alkaline phosphatase treatment to avoid undesirable joining, and 
ligation with appropriate ligases. Techniques for such manipulations are 
disclosed by Maniatis, T., et al.. infra, and are well known in the art. Clones 
containing a transcriptional element of the present invention may be identified 
by any means which specifically selects for a MAdCAM-1 enhancer or silencer 
region DNA such as, for example by hybridization with an appropriate nucleic 
acid probe(s) containing a sequence complementary to all or part of ihe 
transcriptional element. Oligonucleotide probes specific for a transcriptional 
element of the present invention can be designed simply by reference to SEQ 
ID No:33. Techniques for nucleic acid hybridization and clone identification 
are disclosed by Maniatis, T.,etal., (In: Molecular Cloning. A Laboratory 
Manual, Cold Spring Harbor Laboratories, Cold Spring Harbor, NY (1982)), 
and by Hames, B.D.. et al., (In: Nucleic Acid Hybridization. A Practical 
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Approach, IRL Press, Washington, DC (1985)). To facilitate the detection of 
the desired clone containing a transcriptional element of the present invention, 
the above-described nucleic acid probe may be labeled with a detectable group. 
Such detectable groups can be any material having a detectable physical or 
chemical property. Such materials have been well-developed in the field of 
nucleic acid hybridization and in general most any lab^l useful in such methods 
can be applied to the present invention. Particularly useful are radioactive 
labels, such as ^^P, ^H, ^'^C, ^^S, or the like. Any radioactive label may be 
employed which provides for an adequate signal and has a sufficient half-life. 
The oligonucleotide may be radioactively labeled, for example, by "nick- 
translation" by well-known means, as. described in, for example, Rigby, 
PJ.W., et al, /. Mol Biol 113:231 (1977) and by T4 DNA polymerase 
replacement synthesis as described in, for example, Deen, K.C., aL.Anal. 
Biochem. 135:456 (1983). Alternatively, polynucleotides are also useful as 
nucleic acid hybridization probes when labeled with a non-radioactive marker 
such as biotin, an enzyme or a fluorescent group. See, for example, Leary, 
J.J., etai. Proc, Natl, Acad. Sci. USA 50:4045 (1983); Renz, M., etal.^NucL 
Acids Res. 72:3435 (1984); and Renz, M., EMBOJ. 6:817 (1983). 

As used herein, "heterologous protein" is intended to refer to a peptide 
sequence that is heterologous to the transcriptional regulatory elements of the 
invention. A skilled artisan will recognize that, if desired, the teaching herein 
will also apply to the expression of genetic sequences encoding the MAdCAM- 
1 protein, or splice variants thereof, by such transcriptional regulatory 
elements. The reporter genes for use in the screening assay described below 
can code for either the MAdCAM-1 protein, or splice variants thereof, or a 
heterologous protein. Alternatively, detection of reporter gene expression can 
be at the mRNA level, such as, for example, detection of MAdCAM-1 mRNA. 

To express a reporter gene under the control of the transcriptional 
regulatory elements of the invention, the gene must be "operably-linked" to the 
regulatory element. An operable linkage is a linkage in which a desired 
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sequence is connected to a transcriptional or translational regulatory sequence 
(or sequences) in such a way as to place expression (or operation) of the desired 
sequence under the influence or control of the regulatory sequence. 

Two DNA sequences (such as a reporter gene and a promoter region 
sequence linked to the 5' end of the reporter gene) are said to be operably 
linked if induction of promoter function results in the transcription of the 
reporter gene and if the nature of the linkage between the two DNA sequences 
does not (1) result in the introduction of a frame-shift mutation (if reporter 
protein activity is necessary for detection of reporter gene expression), (2) in- 
terfere with the ability of the expression regulatory sequences to direct reporter 
gene expression, or (3) interfere with the ability of reporter gene to be 
transcribed by the promoter region sequence. Thus, a promoter would be 
operably linked to a DNA sequence if the promoter were capable of affecting 
transcription of that DNA sequence. 

In a similar manner, a transcriptional regulatory element of the present 
invention that enhances or represses gene expression may be operably-linked 
to such a promoter. Exact placement of the element in the nucleotide chain is 
not critical as long as the element is located at a position from which the 
desired effects on the operably linked promoter may be revealed. A nucleic acid 
molecule, such as DNA, is said to be "capable of expressing" a polypeptide if 
it contains expression control sequences which contain transcriptional 
regulatory information and such sequences are operably linked to the nucleotide 
sequence which encodes the polypeptide. For the complete control of gene 
expression, all transcriptional and translational regulatory elements (or signals) 
that are operably linked to a heterologous gene should be recognizable by the 
appropriate host. By "recognizable" in a host is meant that such signals are 
functional in such host. 

The MAdCAM-1 transcriptional regulatory elements of the present 
invention, obtained through the methods described above, and preferably in a 
double-stranded form, may be operably linked to a heterologous gene (such as 
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a rqjorter gene), preferably in an expression vector, and introduced into a host 
cell, preferably a eukaryotic cell, to assay reporter gene expression. Preferred 
eukaryotic cells include choriocarcinoma cell lines, breast cancer cell lines, 
prostate carcinoma cell lines and kidney cell lines. 

As is widely known, translation of eukaryotic mRNA is initiated at the 
codon that encodes the first methionine. For this reason, it is preferable to 
ensure that the linkage between a eukaryotic promoter and a reporter gene does 
not contain any intervening codons thai are capable of encoding a methionine. 
The presence of such codons results either in a formation of a fusion protein (if 
the AUG codon is in the same reading frame as the DNA encoding the 
heterologous protein) or a frame-shift mutation (if the AUG codon is not in the 
same reading frame as the reporter gene). 

If desired, a fusion product of a reporter protein may be constructed. 
For example, the sequence coding for the reporter protein may be linked to a 
signal sequence which will allow secretion of the protein from, or the 
compartmentalization of the protein in, a particular host. Such signal sequences 
may be designed with or without specific protease sites such that the signal 
peptide sequence is amenable to subsequent removal. Alternatively, the native 
signal sequence for this protein may be used. 

The transcriptional regulatory elements of the invention can be selected 
to allow for repression or activation, so that expression of the operably linked 
reporter genes can be modulated. Translational signals are not necessary when 
it is desired to express antisense RNA sequences or to assay reporter gene 
expression via mRNA detection. 

If desired, the non-transcribed and/or non-translated regions 3' to the 
reporter gene can be obtained by the above-described cloning methods. The 3'- 
non-transcribed region may be retained for its transcriptional termination 
regulatory sequence elements; the 3 '-non-translated region may be retained for 
its iransiationai termination regulatory sequence elements, or for those elements 
that direct polyadenylation in eukaryotic cells. Where the native expression 
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control sequences signals do not function satisfactorily host cell, then sequences 
functional in the host cell may be substituted. 

To transform a mammalian ceU with the DNA constructs of the 
invention many vector systems are available, depending upon whether it is 
desired to insert the reporter gene product into the host cell chromosomal 
DNA, or to allow it to exist in an extrachromosomal form. If the reporter gene 
and an operably linked promoter are introduced into a recipient eukaryotic cell 
as a non-replicating DNA (or RNA) molecule, which may either be a linear 
molecule or, more preferably, a closed covalent circular molecule that is 
incapable of autonomous replication, reporter gene expression may occur 
through the transient expression of the introduced sequence. 

Genetically stable transformants may be constructed with vector 
systems, or transformation systems, whereby the reporter gene is integrated 
into the host chromosome. Such integration may occur de novo within the cell 
or, in a most preferred embodiment, be assisted by transformation with a vector 
that functionally inserts itself into the host chromosome. Vectors capable of 
chromosomal insertion include, for example, retroviral vectors, transposons or 
other DNA elements which promote integration of DNA sequences in 
chromosomes, especially DNA sequence homologous to a desired chromosomal 
insertion site. 

Cells that have stably integrated die introduced DNA into their 
chromosomes are selected by also introducing one or more markers that allow 
for selection of host cells which that the desired sequence. For example, the 
marker may provide biocide resistance, e.g.. resistance to antibiotics, or heavy 
metals, such as copper, or the like. The selectable marker gene can either be 
directly linked to the reporter gene, or introduced into the same cell by co- 
transfection. In another embodiment, the introduced sequence is incorporated 
into a plasmid or viral vector capable of autonomous replication in the recipient 
host. Any of a wide variety of vectors may be employed for this purpose, as 
outlined below. Factors of importance in selecting a particular plasmid or viral 
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vector include: the ease with which recipient cells that contain the vector may 
be recognized and selected from those recipient cells which do not contain the 
vector; the number of copies of the vector which are desired in a particular 
host; and whether it is desirable to be able to "shuttle" the vector between host 
cells of different species. 

Preferred eukaryotic plasmids include those derived from the bovine 
papilloma virus, vaccinia virus, and SV40. Such plasmids are well known in 
the art and are commonly or commercially available. For example, manwnalian 
expression vector systems in which it is possible to cotransfect with a helper 
virus to amplify plasmid copy number, and, integrate the plasmid into the 
chromosomes of host cells have been described (Perkins, A.S. et ai , MoL Cell 
Biol 3:1123 (1983); Clontech, Palo Alto, California). Particularly preferred 
are vectors derived from pCAT-Basic, pCAT-Enhancer and pCAT-Promoter 
vectors (Promega, Madison, WI). 

Once the vector or DNA sequence containing the construct(s) is 
prepared for expression, the DNA construct(s) is introduced into an appropriate 
host cell by any of a variety of suitable means, including transfection, 
electroporation or delivery by liposomes. DEAE dextrin, calcium phosphate, 
and preferably, the transfection reagent DOTAP, may be useful in the 
transfection protocol. 

After the introduction of the vector in vitro, recipient cells are grown 
in a selective medium, that is, medium that selects for the growth of vector- 
containing cells. Expression of the reporter gene results in the production 
mRNA and, if desired, reporter protein. According to the invention, this 
expression can take place in a continuous manner in the transformed cells, or 
in a controlled manner. If desired, in in vitro culture, the reporter protein is 
isolated and purified in accordance witti conventional conditions, such as 
extraction, precipitation, chromatography, affinity chromatography, 
electrophoresis, or the like. Alternatively, levels of reporter protein expression 
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can be assayed according to conventional protein assays, such as, for example, 
the CAT expression system. 

The MAdCAM-1 transcriptional regulatory elements of the present 
invention (i.e., the MAdCAM-1 promoter, as well as isolated nucleic acid 

5 molecules capable of enhancing and/or repressing gene expression) are useful 

for screening drugs, ligands and/or other rmn^-acting agents to determine 
which are capable of affecting expression of MAdCAM-1 or any splice variant 
thereof. By the invention, trans-SLCiing factors can be identified by their ability 
to up-regulate or down-regulate MAdCAM-1 expression. As used herein, by 

10 "MAdCAM-1 fra/w-acting agent" is intended a drug, ligand, or other 

conq)Ound capable interacting, either directly or indirectly, with a MAdCAM-1 
transcriptional regulatory element of the present invention to enhance or repress 
gene expression. Such MAdCAM-1 rrawj-acting elements which interact 
directly with a transcriptional regulatory element of the present invention 

15 include those, which, for example, bind directly to the element and either 

enhance or repress gene expression, MAdCAM-1 franj-acting agents which 
interact indirectly with a transcriptional regulatory element of the present 
invention include those which, for example, bind to and induce activity of a 
second trans-2ict\ng agent (e.g., a receptor molecule) which itself then, either 

20 alone or complexed to the first frflm-acting agent, binds to the element and 

either enhances or represses gene expression. One type of fran^-acting agent 
is a triplex-forming oligonucleotide. Administration of a suitable 
oligonucleotide will result in the formation of a triple helix between the 
oligonucleotide and the MAdCAM-1 promoter, which will inhibit transcription 

25 from that promoter (Ebbmghaus, S.W. et al. Gene Therapy 3: 287-297 (1996); 

Roy, C, Eur. J. Biochem. 220: 493-503 (1994)). Because the genomic 
sequence of the region 5' of the MAdCAM-1 gene is given herein (See FIG. 
6 and SEQ ID NO: 37), one of ordinary skill in the art will readily be able to 
design suitable oligonucleotides (also called "anti-sense" oligonucleotides) 

30 which can inhibit expression from the MAdCAM-1 promoter. One region 
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which is especially useful for anti-sense design is the 5' untranslated region (/. 
BioL Chem, 266: 18162-18171 (1991)). which of course is not included in a 
cDNA, but is included in the genomic sequence disclosed herein. 

Thus, in one aspect, the invention provides a screening assay for 
determining whether any given compound is capable of up-regulating or down- 
regulating expression from the MAdCAM-1 promoter, leading to an increase 
or decrease of MAdCAM-1 production. 

The screening assay involves (1) providing a host cell transfected with 
a recombinant nucleic acid molecule containing a MAdCAM-1 transcriptional 
regulatory element of the present invention and a reporter gene, wherein the 
transcriptional element is operably linked to the reporter gene; (2) 
administering a candidate MAdCAM-1 trans-acting agent to the transfected host 
cell; and (3) determining the effect on reponer gene expression. 

In a preferred embodiment, the invention provides a screening assay for 
the identification of substances capable of altering the expression from the 
MAdCAM-1 promoter, comprising: 

(a) measuring the level of expression of a reporter gene in 
a test cell, wherein said test cell is transformed with a recombinant DNA 
molecule comprising a reporter gene operably linked to a DNA molecule 
comprising the promoter of MAdCAM-1 , and wherein a candidate MAdCAM-1 
rran^-acting agent is administered to said test cell; 

(b) measuring the level of expression of said reporter gene 
in a control cell, wherein said control cell is transformed with the recombinant 
DNA molecule of step (a); and 

(c) comparing the level of expression of said reporter gene 
in said test cell to the level of said reporter gene in said control cell. 

Suitable and preferred host cells, transfection methods, expression 
vectors, promoters, and reporter genes, are described above and will be known 
in the art. 
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Having generally described the invention, the same will be more readily 
understood by reference to the following examples, which are provided by way 
of illustration and are not intended as limiting. 

Examples 

Example 1: Expression and Purification of any of MAdCAM-lfa-e) in 
E. coli 

The DNA sequence encoding any of the mature MAdCAM-1 (a-e) proteins 
is amplified using PGR oligonucleotide primers specific to the amino terminal 
sequences of the desired MAdCAM-l{a-e) protein and to vector sequences 3' to 
the gene. Additional nucleotides containing restriction sites to facilitate cloning 
are added to the 5' and 3' sequences respectively. 

To obtain the DNA sequence encoding MAdCAM-l(a), the plasmid 
HEBBC23 is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(b), the plasmid 
HSKCW36 is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdC AM- 1(c), the plasmid 
MAdCAM-lc is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(d), the plasmid 
MAdCAM-ld is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdC AM- 1(e), the plasmid 
MAdCAM-le is used, along with the primers given below. 

The 5' oligonucleotide primer has the sequence 5'cgc ccatgg gc cag tec 
etc cag gtg 3' (SEQ ID NO: 11) containing the underlined Ncol restriction site, 
which encodes 17 nucleotides of the codmg sequence of the gene encoding any 
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of the MAdCAM-l(a-e) proteins shown in FIGS. 1-5 (SEQ ID NOs:l, 3, 5, 7, 9), 
respectively, beginning immediately after the signal peptide. 

The 3 ' primer has the sequence 5 ' cgc aagctt tea ggg cag ctg gtc acc cgc 3 ' 
(SEQ ID NO: 12) containing the underlined Hindlll restriction site followed by 
nucleotides complementary to nucleotides 940-967 of FIG. 1, which follow 
immediately after the coding sequence of any of MAdCAM-l(a-e). 

The restriction sites are convenient to restriction enzyme sites in the 
bacterial expression vector pQE60, which are used for bacterial expression in 
these examples. (Qiagen, Inc. 9259 Eton Avenue, Chatsworth, CA, 91311). 
pQE60 encodes ampicillin antibiotic resistance ("Amp'") and contains a bacterial 
origin of replication ("ori"), an IPTG inducible promoter, a ribosome binding site 
("RBS"), a 6-His tag and restriction enzyme sites. 

The amplified DNA encoding any of MAdCAM-l(a-e) and the vector 
pQE60 both are digested with Ncol and Hindlll and the digested DNAs are then 
ligated together. Insertion of the DNA encoding any of the MAdCAM-l(a-e) 
proteins into the restricted pQE60 vector places the coding region of MAdCAM- 
l(a-e) downstream of and operably linked to the vector's IPTG-inducible 
promoter and in-frame with an initiating AUG appropriately positioned for 
translation of the appropriate MAdCAM-l(a-e) protein. 

The ligation mixture is transformed into competent £. coli cells using 
standard procedures. Such procedures are described in Sambrook et al., 
MOLECULAR CLONING: A LABORATORY MANUAL. 2nd Ed.; Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989). £. coli strain 
M15/rep4, containing multiple copies of the plasmid pREP4, which expresses lac 
repressor and confers kanamycin resistance ("Kan'"), is used in carrying out the 
illustrative example described herein. This strain, which is only one of many that 
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are suitable for expressing any of the MAdCAM-l(a-e) proteins, is available 
commercially from Qiagen. 

Transformants are identified by their ability to grow on LB plates in the 
presence of ampiciUin and kanamycin. Plasmid DNA is isolated from resistant 
colonies and the identity of the cloned DNA confirmed by restriction analysis. 

Clones containing the desired constructs are grown overnight ("0/N") in 
liquid culture in LB media supplemented with both ampiciUin (100 ^ig/ml) and 
kanamycin (25 ng/ml). 

The 0/N culture is used to inoculate a large culture, at a dilution of 
approximately 1 : 100 to 1:250. The cells are grown to an optical density at 600run 
("OD600") of between 0.4 and 0.6. Isopropyl-B-D-thiogalactopyranoside 
("IPTG") is then added to a final concentration of 1 mM to induce transcription 
from lac repressor sensitive promoters, by inactivating the lad repressor. Cells 
subsequently are incubated further for 3 to 4 hours. Cells then are harvested by 
centrifugation and disrupted, by standard methods. Inclusion bodies are purified 
from the disrupted cells using routine collection techniques, and protein is 
solubilized from the inclusion bodies into 8M urea. The 8M urea solution 
containing the solubilized protein is passed over a PD-10 column in 2X 
phosphate-buffered saline ("PBS"), thereby removing the urea, exchanging the 
buffer and refolding the protein. The protein is purified by a further step of 
chromatography to remove endotoxin. Then, it is sterile filtered. The sterile 
filtered protein preparation is stored in 2X PBS at a concentration of 95 |ig/ml. 
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Example 2: Cloning and Expression of any of the MAdCAM-lfa-^) proteins 
in a Baculovirus Expression System 

In this illustrative example, the plasmid shuttle vector pA2 is used to 
insert the cloned DNA encoding the complete protein, including its naturally 
associated secretary signal (leader) sequence, into a baculovirus to express any 
of the mature protems MAdCAM-1 (a-e), using standard methods as described in 
Summers et ah, A Manual of Methods for Baculovirus Vectors and Insect Cell 
Culture Procedures, Texas Agricultural Experimental Station Bulletin No. 1 555 
(1987). This expression vector contains the strong polyhedrin promoter of the 
Autographa californica nuclear polyhedrosis virus (AcMNPV) followed by 
convenient restriction sites such as BamHl and Asp718. The polyadenylation site 
of the simian virus 40 ("SV40") is used for efficient polyadenylation. For easy 
selection of recombinant virus, the plasmid contains the beta-galactosidase gene 
from E. coli under control of a weak Drosophila promoter in the same orientation, 
followed by the polyadenylation signal of the polyhedrin gene. The inserted 
genes are flanked on both sides by viral sequences for cell-mediated homologous 
recombination with wild-type viral DNA to generate viable virus that express the 
cloned polynucleotide. 

Many other baculovirus vectors could be used in place of the vector 
above, such as pAc373, pVL941 and pAclMl, as one skilled in the art would 
readily appreciate, as long as the construct provides appropriately located signals 
for transcription, translation, secretion and the like, including a signal peptide and 
an in-frame AUG as required. Such vectors are described, for instance, in 
Luckow et al, Virology 770:31-39. 

The cDNA sequence encoding any of the full length MAdCAM-l(a-e) 
proteins is amplified using PGR oligonucleotide primers corresponding to the 5' 
and 3' sequences of the gene. 
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To obtain the DNA sequence encoding MAdCAM-l(a), the plasmid 
HEBBC23 is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(b), the plasmid 
HSKCW36 is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(c), the plasmid 
MAdCAM-lc is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(d), the plasmid 
MAdCAM-ld is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(e), the plasmid 
MAdCAM-le is used, along with the primers given below. 

The 5' primer has the sequence 5'cgc ggalSSCgccatc atg gatttc ggactg gcc 
3' (SEQ ID N0:13) containing the underlined BamHl restriction enzyme site 
followed by 18 bases of the sequence of the relevant MAdCAM-l(a-e) protein 
shown in FIGS. 1-5, respectively. Inserted into an expression vector, as described 
below, the 5' end of the amplified fiagment encoding the relevant MAdCAM- 1 (a- 
e) protein provides an efficiem signal peptide. An efficient signal for initiation 
of translation in eukaryotic cells, as described by Kozak, M., J. Mol. Biol. 196: 
947-950 (1987) is appropriately located in the vector portion of the construct. 

The 3 ' primer has the sequence 5'cgc ggtacc tea ctt gaa ggg gtc caa gc 3' 
(SEQ ID N0:14) containing the underlined Asp718 restriction site followed by 
nucleotides complementary to nucleotides 1183-1199 of FIG. 1, which follow 
immediately after the coding sequence of any of MAdCAM- l(a-e). 

The cDNA sequence encoding the extracellular soluble domain of any of 
the MAdCAM-l(a-e) proteins is amplified using PGR oligonucleotide primers 
corresponding to the 5' and 3' sequences of the gene. 

To obtain the DNA sequence encoding MAdCAM-l(a), the plasmid 
HEBBC23 is used, along with the primers given below. 
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To obtain the DNA sequence encoding MAdCAM-l(b), the plasmid 
HSKCW36 is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(c), the plasmid 
MAdCAM-lc is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(d), the plasmid 
MAdCAM-ld is used, along with the primers given below. 

To obtain the DNA sequence encoding MAdCAM-l(e), the plasmid 
MAdCAM-le is used, along with the primers given below. 

The 5 ' primer has the sequence 5 'cgc ggatcc gcc ate atg gat ttc gga ctg gcc 
3' (SEQ ID NO: 15), containing the underiined BamHI restriction enzyme site 
followed by 18 bases of the sequence of the relevant MAdCAM-l(a-e) protein 
shown in FIGS. 1-5, respectively. Inserted into an expression vector, as described 
below, the 5' end of the amplified fragment encoding the relevant MAdCAM-l(a- 
e) protein provides an efficient signal peptide. An efficient signal for initiation 
of translation in eukaryotic cells, as described by Kozak, M., J. Mol. Biol. 196: 
947-950 (1987) is appropriately located in the vector portion of the construct. 

The 3 ' primer has the sequence 5 'cgc ggtass tea ggg cag ctg gtc acc cgc 
3' (SEQ ID NO: 16) containing the underiined Asp718 restriction site followed 
by nucleotides complementary to nucleotides 940-967 of FIG. 1, which follow 
immediately after the coding sequence of any of MAdCAM-l(a-e). 

The amplified fragment is isolated from a 1% agarose gel using a 
commercially available kit ("Geneclean," BIO 101 Inc., La JoUa, Ca.). The 
fragment then is digested with BamHI and Asp71 8 and again is purified on a 1% 
agarose gel. This fragment is designated herein F2. 

The plasmid is digested with the restriction enzymes BamHI and Asp7 18 
and then is dephosphorylated using calf intestinal phosphatase, using routine 
procedures known in the art. The DNA is then isolated from a 1% agarose gel 
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using a commercially available kit ("Geneclean" BIO 101 Inc.. La Joila, Ca.). 
This vector DNA is designated herein "V2". 

Fragment F2 and the dephosphorylated piasmid V2 are ligated together 
with T4 DNA ligase. E. coli HBlOl cells are transformed with ligation mix and 
spread on culture plates. Bacteria are identified that contain the piasmid with the 
desired human gene encoding MAdCAM-l(a-e) by digesting DNA from 
individual colonics using Xbal and then analyzing the digestion product by gel 
electrophoresis. The sequence of the cloned fragment is confirmed by DNA 
sequencing. This piasmid is designated herein pBacMAdCAM-l(a-e) (i.e., if 
MAdCAM-l(a) is cloned, the piasmid is pBacMAdCAM-l(a), while if 
MAdCAM-l(b) is cloned, the piasmid is pBACMAdCAM-l(b), etc.). 

5 Jig of the piasmid pBacMAdCAM-l(a-e) is co-transfected with 1.0 jig 
of a commercially available linearized baculovirus DNA ("BaculoGold™ 
baculovirus.DNA", Pharmingen, San Diego, CA.), using the lipofection method 
described by Feigner et al., Proc. Natl. Acad. Sci. USA 84: 7413-7417 (1987). 
lug of BaculoGoId™ virus DNA and 5 ng of the piasmid pBacMAdCAM-l(a-e) 
are mixed in a sterile well of a microtiter plate containing 50 nl of serum-free 
Grace's medium (Life Technologies Inc., Gaithersburg, MD). Afterwards 10 nl 
Lipofectin plus 90 \i\ Grace's medium are added, mixed and incubated for 15 
minutes at room temperature. Then the transfection mixture is added drop-wise 
to Sf9 insect cells (ATCC CRL 1711) seeded in a 35 mm tissue culture plate with 
1 ml Grace's medium without serum. The plate is rocked back and forth to mix 
the newly added solution. The plate is then incubated for 5 hours at 27°C. After 
5 hours the transfection solution is removed from the plate and I ml of Grace's 
insect medium supplemented with 1 0% fetal calf serum is added. The plate is put 
back into an incubator and cultivation is continued at 27°C for four days. 



wo 98/20110 



PCT/US96/17549 



-57- 

After four days the supernatant is collected and a plaque assay is 
performed, as described by Summers and Smith, cited above. An agarose gel 
with "Blue Gal" (Life Technologies Inc., Gaithersburg) is used to allow easy 
identification and isolation of gal-expressing clones, which produce blue-stained 
plaques. (A detailed description of a "plaque assay" of this type can also be 
found in the user's guide for insect cell culture and baculovirology distributed by 
Life Technologies Inc., Gaithersburg, page 9-10). 

Four days after serial dilution, the virus is added to the cells. After 
appropriate incubation, blue stained plaques are picked with the tip of an 
Eppendorf pipette. The agar containing the recombinant viruses is then 
resuspended in an Eppendorf tube containing 200 \xl of Grace's medium. The agar 
is removed by a brief centrifugation and the supernatant containing the 
recombinant baculovirus is used to infect Sf9 cells seeded in 35 mm dishes. Four 
days later the supematants of these culture dishes are harvested and then they are 
stored at 4°C. A clone containing any of the properly inserted genes encoding 
MAdCAM-l(a-e) is identified by DNA analysis including restriction mapping 
and sequencing. This is designated herein as V-MAdCAM-l(a-e)i i.e., V- 
MAdCAM-l(a), or V-MAdCAM-l(b), etc., depending on which N4AdCAM-l 
variant is cloned. 

Sf9 cells are grown in Grace's medium supplemented with 10% heat- 
inactivated FBS. The cells are infected with the recombinant baculovirus V- 
MAdCAM-l(a-e) at a multiplicity of infection ("MOI") of about 2 (about 1 to 
about 3). Six hours later the medium is removed and is replaced with SF900 II 
medium minus methionine and cysteine (available from Life Technologies Inc., 
Gaithersburg). 42 hours later, 5 nCi of -^^S-methionine and 5 \xCi ^^S-cysteine 
(available from Amersham) are added. The cells are further incubated for 16 
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hours and then they are harvested by centrifugation, lysed and the labeled proteins 
are visualized by SDS-PAGE and autoradiography. 

Example 3: Cloning and Expression in Mammalian Cells 

Most of the vectors used for the transient expression of the gene sequence 
encoding any of MAdCAM-l(a-e) proteins in mammalian cells should carry the 
S V40 origin of replication. This allows the replication of the vector to high copy 
numbers in cells (e.g. COS cells) which express the T antigen required for the 
initiation of viral DNA synthesis. Any other mammalian cell line can also be 
utilized for this purpose. 

A typical mammalian expression vector contains the promoter element, 
which mediates the initiation of transcription of mRNA, the protein coding 
sequence, and signals required for the termination of transcription and 
polyadenylation of the transcript. Additional elements include enhancers, Kozak 
sequences and intervening sequences flanked by donor and acceptor sites for 
RNA splicing. Highly efficient transcription can be achieved with the early and 
late promoters from SV40, the long terminal repeats (LTRs) from Retroviruses, 
e.g. RSV, HTLVI, HIVI and the early promoter of the cytomegalovirus (CMV). 
However, cellular signals can also be used (e.g. human actin promoter). Suitable 
expression vectors for use in practicing the present invention include, for 
example, vectors such as pSVL and pMSG (Pharmacia, Uppsala, Sweden), 
pRSVcat (ATCC 37152), pSV2dhfr (ATCC 37146) and pBC12MI (ATCC 
671 09). Mammalian host cells that could be used include, human Hela, 283, H9 
and JurkartceUs, mouse NIH3T3 andC127cells, Cos 1, Cos 7 and C VI, African 
green monkey cells, quail QCl-3 cells, mouse L cells and Chinese hamster ovary 
cells. 
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Alternatively, the gene can be expressed in stable cell lines that contain 
the. gene integrated into a chromosome. The co-transfection with a selectable 
marker such as dhfr, gpt, neomycin, hygromycin allows the identification and 
isolation of the transfected cells. 

The transfected gene can also be amplified to express large amounts of the 
encoded protein. The DHFR (dihydrofolate reductase) is a useful marker to 
develop cell lines that carry several hundred or even several thousand copies of 
the gene of interest. Another useful selection marker is the enzyme glutamine 
synthase (GS) (Murphy et aL Biochem 1 22 7. 277-279 ( 1 99 1 ); Bebbington et ai, 
Bio/Technology 70.* 169- 175 (1992)). Using these markers, the mammalian cells 
are grown in selective medium and the cells with the highest resistance are 
selected. These cell lines contain the amplified gene(s) integrated into a 
chromosome. Chinese hamster ovary (CHO) cells are often used for the 
production of proteins. 

The expression vectors pCl and pC4 contain the strong promoter (LTR) 
of the Rous Sarcoma Virus (CuUen et aL, Molecular and Cellular Biology, 
438-4470 (March, 1985)) plus a fragment of the CMV-enhancer (Boshart et al, 
Cell ^7:521-530 (1985)). Multiple cloning sites, e.g. with the restriction enzyme 
cleavage sites BamHI, Xbal and Asp718, facilitate the cloning of the gene of 
interest. The vectors contain in addition the 3' intron, the polyadenylation and 
termination signal of the rat preproinsulin gene. 

Example 3(a): Cloning and Expression in COS Cells 

The expression plasmid, pMAdCAM-l(a-e) HA, is made by cloning a 
cDNA encoding one of MAdCAM-l(a-e) into the expression vector 
pcDNAI/Amp (which can be obtained fi-om Invitrogen, Inc.). 



wo 98/20110 



-60- 



PCT/US9d/17549 



The expression vector pcDNAI/amp contains: (1) an Kcoli origin of 
replication effective for propagation in E. coli and other prokaryotic cells; (2) an 
ampicillin resistance gene for selection of plasmid-containing prokaryotic cells; 
(3) an SV40 origin of replication for propagation in eukaryotic cells; (4) a CMV 
promoter, a polylinker, an SV40 intron. and a polyadenylation signal arranged so 
that a cDNA conveniently can be placed under expression control of the CMV 
promoter and operably linked to the SV40 intron and the polyadenylation signal 
by means of restriction sites in the polylinker. 

A DNA fragment encoding the relevant MAdCAM-l(a-e) protein is 
cloned into the polylinker region of the vector so that recombinant protein 
expression is directed by the CMV promoter. The plasmid construction strategy 
is as follows. The cDNA encoding the relevant MAdCAM-l(a-e) is amplified 
using primers that contain convenient restriction sites, much as described above 
regarding the construction of expression vectors for expression of the desired 
MAdCAM-l(a-e) in E. coli. 

Suitable primers include the following, which are used in this example. 
The DNA sequence encoding the fiiU length protein of any of MAdCAM- 
1 (a-e) is amplified using PCR oligonucleotide primers corresponding to the 5' and 
3' sequences of the gene: 

The 5' primer has the sequence 5' cgc ggatcc gcc ate atg gat ttc gga ctg gcc 
3' (SEQ ID NO: 17) containing the underlined BamHl restriction enzyme site 
followed by 18 bases of the sequence of the relevant MAdCAM-l(a-e) gene 
shown in FIGS. 1-5 (SEQ ID NOs:l, 3, 5, 7, 9), respectively. Inserted into an 
expression vector, as described below, the 5* end of the amplified fragment 
encoding any of human MAdCAM-l(a-e) provides an efficient signal peptide. 
An efficient signal for initiation of translation in eukaryotic cells, as described by 
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Kozak, M., J. Mol. Biol. 196:947-950 (1987) is appropriately located in the 
vector portion of the construct. 

The 3' primer has the sequence 5' cgc ggtacc tea ctt gaa ggg gtc caa gc 3' 
(SEQ ID NO: 18) containing the underiined Asp718 restriction followed by 
nucleotides complementary to nucleotides 1183-1199 of the MAdCAM-l(a) 
coding sequence given in FIG. 1 . 

In order to clone a gene encoding the extracellular soluble domain of 
MAdCAM-l(a-e), the 5' primer, containing the underiined BamHI site, an AUG 
start codon and 18 codons of the 5' coding region has the following sequence: 
5' cgc ggatcc gcc ate atg gat ttc gga ctg gcc 3' (SEQ ID 

NO: 19). 

The 3' primer, containing an Xbal site, a stop codon, and 3' coding 
sequence for the extracellular domain, has the following sequence: 

5 ' cgc tctaga tea age gta gtc tec gac gtc gta tgg gta 3 ' (SEQ 
IDNO:20). 

The PGR amplified DNA fragment and the vector, pcDNAI/Amp, are 
digested with Hindlll and Xhol and then ligated. The ligation mixture is 
transformed into E, coli strain SURE (available from Stratagene Cloning 
Systems, 11099 North Torrey Pines Road, La Jolla, CA 92037), and the 
transformed culture is plated on ampicillin media plates which then are incubated 
to allow growth of ampicillin resistant colonies. Plasmid DNA is isolated from 
resistant colonies and examined by restriction analysis and gel sizing for the 
presence of a Segment encoding the relevant MAdCAM-l(a-e). 

For expression of recombinant MAdCAM-l(a-e), COS cells are 
transfected with an expression vector, as described above, using DEAE- 
DEXTRAN, as described, for instance, in Sambrook et al., MOLECULAR 
CLONING: A LABORATORY MANUAL, Cold Spring Laboratory Press, Cold 
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Spring Harbor, New York (1989). Cells are incubated under conditions for 
expression of MAdCAM-l(a-e) by the vector. 

Expression of the MAdCAM-l(a-e)HA fusion protein is detected by 
radiolabelling and immunoprecipitation, using methods described in, for example 
Harlow et al., ANTIBODIES: A LABORATORY MANUAL, 2nd Ed.; Cold 
Spring Harbor Laboratory Press, Cold Spring HariDor, New York (1988). To this 
end, two days after transfection, the cells are labeled by incubation in media 
containing «S-cysteine for 8 hours. The cells and the media are collected, and the 
cells are washed and the lysed with detergent-containing RIPA buffer: 150 mM 
NaCl, 1% NP-40, 0.1% SDS, 1% NP-40, 0.5% DOC, 50 mM TRIS, pH 7.5, as 
described by Wilson et al. cited above. Proteins are precipitated from the cell 
lysate and from the culture media using an HA-specific monoclonal antibody. 
The precipitated proteins then are analyzed by SDS-PAGE gels and 
autoradiography. An expression product of the expected size is seen m the cell 
lysate, which is not seen in negative controls. 

Example 3(b): Cloning and Expression in CHO Cells 

The vector pCl is used for the expression of any of the MAdCAM-l(a-e) 
proteins. Plasmid pCl is a derivative of the plasmid pSV2-dhfr [ATCC 
Accession No. 37146]. Both plasmids contain the mouse DHFR gene under 
control of the SV40 early promoter. Chinese hamster ovary- or other cells 
lacking dihydrofolate activity that are transfected with these plasmids can be 
selected by growing the cells in a selective medium (alpha minus MEM, Life 
Technologies) supplemented with the chemotherapeutic agent methotrexate. The 
amplification of the DHFR genes in cells resistant to methotrexate (MTX) has 
been well documented (see, e.g., Alt, F.W., Kellems, R.M., Bertino, J.R., and 
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Schimke, R.T., 1978, J. Biol. Chem. 253:1357-1370, Hamlin, J.L. and Ma, C. 
1990, Biochem. et Biophys. Acta, 1097:107-143, Page, MJ. and Sydenham, 
M,A. 1991, Biotechnology Vol. 9:64-68). Cells grown in increasing 
concentrations of MTX develop resistance to the drug by overproducing the target 
enzyme, DHFR, as a result of amplification of the DHFR gene. If a second gene 
is linked to the DHFR gene it is usually co-amplified and over-expressed. It is 
state of the art to develop cell lines carrying more than 1,000 copies of the genes. 
Subsequently, when the methotrexate is withdravwi, cell lines contain the 
amplified gene integrated into the chromosome(s), 

Plasmid pCl contains for the expression of the gene of interest a strong 
promoter of the long terminal repeat (LTR) of the Rous Sarcoma Virus (Cullen, 
et al.. Molecular and Cellular biology, March 1985, 438-4470) plus a fragment 
isolated from the enhancer of the immediate early gene of human 
cytomegalovirus (CMV) (Boshart et al, Cell 41 :52 1-530, 1985). Downstream 
of the promoter is a BamHI restriction enzyme cleavage site that allows the 
integration of the genes. Behind this cloning site the plasmid contains 
translational stop codons in all three reading frames followed by the 3' intron and 
the polyadenylation site of the rat preproinsulin gene. Other high efficient 
promoters can also be used for the expression, e.g., the human P-actin promoter, 
the SV40 early or late promoters or the long terminal repeats from other 
retroviruses, e.g., HIV and HTLVI. For the polyadenylation of the mRNA other 
signals, e.g., from the human growth hormone or globin genes can be used as 
well. 

Stable cell lines carrying a gene of interest integrated into the 
chromosomes can also be selected upon co-transfection with a selectable marker 
such as gpt, G418 or hygromycin. It is advantageous to use more than one 
selectable marker in the begmning, e.g. G4 1 8 plus methotrexate. 
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The plasmid pCl is digested with the restriction enzyme BaraHI and then 
dephosphorylated using calf intestinal phosphates by procedures known in the art. 
The vector is then isolated from a 1% agarose gel. 

The DNA sequence encoding the full length protein of any of MAdCAM- 
5 l(a-e) is amplified using PGR oligonucleotide primers corresponding to the 5' and 

3' sequences of the gene: 

The 5' primer has the sequence 5' cgc ggi^Sfi gcc ate atg gat ttc gga ctg gcc 
3' (SEQ ID NO: 17) containing the underlined BamHl restriction enzyme site 
followed by 18 bases of the sequence of the relevant MAdCAM-l(a-e) gene 
10 shown in FIGS. 1-5 (SEQ ID NOs: 1,3,5,7, 9), respectively. Inserted into an 

expression vector, as described below, the 5' end of the amplified fragment 
encoding any of human MAdCAM-l(a-e) provides an efficient signal peptide. 
An efficient signal for initiation of translation in eukaryotic cells, as described by 
Kozak, M., J. Mol. Biol. 196:947-950 (1987) is appropriately located in the 
1 5 vector portion of the construct. 

The 3" primer has the sequence 5' cgc ggiasfi tea ctt gaa ggg gtc caa gc 3' 
(SEQ ID N0:18) containing the underlined Asp718 restriction followed by 
nucleotides complementary to nucleotides 1183-1199 of the MAdCAM-l(a) 
coding sequence given in FIG. 1 ; 
20 The DNA sequence encoding the extracellular soluble domain of any of 

MAdCAM-l(a-e) proteins is amplified using PGR oligonucleotide primers 
corresponding to the 5' and 3' sequences of the gene: 

The 5' primer has the sequence 5' cgc ggiJSS gcc ate atg gat ttc gga ctg gcc 
3' (SEQ ID NO: 17) containing the underlined BamHl restriction enzyme site 
25 followed by 18 bases of the sequence of tlie relevant MAdGAM-l(a-e) gene 

shown in FIGS. 1-5 (SEQ ID NOs:l, 3, 5, 7, 9), respectively. Inserted into an 
expression vector, as described below, the 5' end of the ampUfied firagment 
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encoding any of human MAdCAM-l(a-e) provides an efficient signal peptide. 
An efficient signal for initiation of translation in eukaryotic cells, as described by 
Kozak, M., J. Mol. Biol. 196:947-950 (1987) is appropriately located in the 
vector portion of the construct. 

The 3' primer has the sequence 5' cgc ggtacc tea ggg cag ctg gtc acc cgc 
3* (SEQ ID N0:21) containing the underlined Asp718 restriction followed by 
nucleotides complementary to nucleotides 940-967 of the MAdCAM-l(a) coding 
sequence given in FIG. 1 . 

The amplified firagments are isolated from a 1 % agarose gel as described 
above and then digested with the endonuclease BamHl and then purified again 
on a 1% agarose gel. 

The isolated fi-agment and the dephosphorylated vector are then ligated 
with T4 DNA ligase. Exoli HBlOl cells are then transformed and bacteria 
identified that contained the plasmid pCl inserted in the correct orientation using 
the restriction enzyme BamHI, The sequence of the inserted gene is confirmed 
by DNA sequencing. 

Transfection ofCHO-DHFR-cells 

Chinese hamster ovary cells lacking an active DHFR enzyme are used for 
transfection. 5 (ig of the expression plasmid CI are cotransfected witli 0.5 ^ig of 
the plasmid pSVneo using the lipofecting method (Feigner et al., supra). The 
plasmid pSV2-neo contains a dominant selectable marker, the gene neo fi-om Tn5 
encoding an enzyme that confers resistance to a group of antibiotics including 
G418. The cells are seeded in alpha minus MEM supplemented with 1 mg/ml 
041 8. After 2 days, the cells are trypsinized and seeded in hybridoma cloning 
plates (Greiner, Germany) and cultivated fi-om 10-14 days. After this period. 



wo 98/20110 



-66- 



PCT/US96/17549 



single clones are trypsinized and then seeded in 6-welI petri dishes using different 
concentrations of methotrexate (25 nM, 50 nM, 100 nM, 200 nM, 400 nM). 
Clones growing at the highest concentrations of methotrexate are then transferred 
to new 6-well plates containing even higher concentrations of methotrexate (500 
nM, 1 jiM, 2 \iM, 5 ^M). The same procedure is repeated until clones grow at 
a concentration of 1 00 ^lM. 

The expression of the desired gene product is analyzed by Western blot 

analysis and SDS-PAGE. 

Exanqile 4: Tissue distribution of expression of MAdCAM-l(a-e) proteins 

Northern blot analysis was canied out to examine expression of the 
MAdCAM-l(a) gene in human tissues, using methods described by, among 
others, Sambrook et ai, cited above. A cDNA probe containing the entire 
nucleotide sequence of the gene encoding the MAdCAM-l(a) protein (SEQ ID 
N0:1) was labeled with "P using the redzprime™ DNA labeling system 
(Amersham Life Science), according to manufacturer's instructions. After 
labeling, the probe was purified using a CHROMA SPIN-100™ column 
(Clontech Laboratories, Inc.), according to manufacturer's protocol number 
PT1200-1 . The purified labeled probe was then used to examine various human 
tissues for mRNA corresponding to any of MAdCAM-l(a). 

Multiple Tissue Northern (MTN) blots containing various human tissues 
(H) or human immune system tissues (IM) were obtained fix)m Clontech and were 
examined with labeled probe using ExpressHyb™ hybridization solution 
(Clontech) according to manufacturer's protocol number PT 11 90-1. Following 
hybridization and washing, the blots were mounted and exposed to fihn at -70°C 
overnight, and films developed according to standard procedures. 
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The blots revealed that MAdCAM-l(a) is expressed strongly in small 
intestine, less strongly in colon and spleen, and very weakly in pancreas and 
brain. 

Example 5: Sequence Analysis of Human MAdCAM-l cDNAs and Genomic 
Clones 

Materials and Methods 

Isolation of human MAdCAM-l cDNA and genomic clones 

Human MAdCAM-1 cDNA was initially identified as an expressed 
sequence tag (EST) following screens for homology in an EST cDNA database 
(Adams. M.D., et al Nature 377\'hAl (1995); Adams, M.D. et al Science 
252: 165 1-1656 (1991); Adams, M.D., ei al Nature 355\ 632-63444 (1 992)) using 
the BLAST network service provided by the National Center for Biotechnology 
Information. Partial-length MAdCAM-1 cDNA clones HEBBC23X and 
HEBBC23 Y were identified in a database from an early stage human brain cDNA 
library. The library was constructed as described previously (Adams, M.D., et 
al Nature 377-3 Al (1995)) using the Lambda ZAP II vector (Stratagene, La 
Jolla, California) from cDNA synthesized according to the method of Gubler and 
Hoffman. A MAdCAM-1 genomic clone was subsequently isolated by screening 
a cosmid library constructed in the cosmid vector pCV007 (Choo, K. H., et al, 
Gene 46: 277 (1986)). The library was replica plated onto Gene-Screen Plus 
filters (DuPont, Boston, MA), and screened as described previously (Leung, E,, 
et al Int, Immunol 5: 551-558 (1993)) with the insert of the MAdCAM-1 EST 
clone labeled by random hexanucleotide priming (see Example 6). 
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DNA sequencing 

DNA sequences were determined by cycle sequencing using Applied 
Biosystems automated DNA sequenators (The Centre for Gene Technology, 
School of Biological Sciences, University of Auckland, Auckland, New Zealand; 

5 and at Human Genome Sciences Inc., Rockville, Maryland). The complete 

composite MAdCAM-1 sequencer obtained from genomic and cDNA clones was 
determined on both strands using a combination of universal Ml 3 primers, and 
primers specific for human MAdCAM-1 sequences. A MAdCAM-l genomic 
clone was subsequently isolated by screening a cosmid library constructed in the 

10 cosmid vector pCV007 (Choo, K. H., et al, Gene 46: 111 (1986)). The library 

was replica plated onto Gene-Screen Plus filters (DuPont, Boston, MA), and 
screened as described previously (Leung, E., et al Int. Immunol 5: 551-558 
(1993)) with the insert of the MAdCAM-1 EST clone labeled by random 
hexanucleotide priming. 

15 PCR amplification and identification ofMAdCAM-l splice variants 

For PCR amplification to detect MAdCAM-1 variants, ten micrograms 
of total RNA from human fetal brain (Clontech) in reverse transcriptase (RT) 
buffer (BRL, Gaithersburg, MD) was heated to 70 °C for 3 min and then cooled 
on ice. All four dNTPs were added to a final concentration of 0.5 mM, together 
20 with 500 ng of random hexamer primers, and 400 U of Superscript RT (BRL, 

Life Technologies Inc. MD, USA) in a total volume of 20 The random 
priming reaction was incubated at 42 °C for 2 h. Two ml of this cDNA was 
subjected to 20 cycle of amplification in a thermocycler (95 °C 30 sec; 63 °C 30 
sec; 72X 30 sec) with 100 ng primer U166+ (SEQ ID NO:22) (5'-CGC TCT 



wo 98/20110 



PCT/US96/17549 



-69- 



CCT TCT CCC TGC TC-3') and 100 ng of primer L776- (SEQ ID NO:23) 
(5TGG TOG GTG GGT GTC GTC CTC A-3'), using a final dNTP 
concentration of 200 ^iM and 2.6 U of Expand (Boehringer Mannheim). The 
U166+ and L776 primers correspond to tlie sequences 435-454 and 1047-1068 
of human MAdCAM-1 . An aliquot of 2.5 fxl of the PGR reaction was reamplified 
For 25 cycles using the U166+ primer, and the nested primer L743- (SEQ ID 
NO:24) 5'-CGG GAG GGT TTC GAG AGG TGA TAC-30 corresponding to 
nucleotides 1 01 3-1037, with the same annealing temperature. The PGR product 
was ethanol precipitated and ligated into an EcoRV digested, Taq polymerase 3' 
dTTP-tailed pBluescript vector, and sequenced. PGR was also used as described 
above to demonstrate continuity between genomic MAdGAM-1 5 '-sequences and 
the MAdCAM-1 EST. Twenty cycles of amplification were carried out (95 °C 
30 sec; 69°G 45 sec; 72°C 45 sec) with 100 ng primer U203 (SEQ ID NO:25) 
(5'-GGGACTGAGCATGGATTT CGACTGGCCCT-3') and 100 ng of primer 
L103 (SEQ ID NO:26) (5'CGTACAGGCCACCTCCGGGTCACCAGGCA- 
CCA-3'), using a final dNTP concentration of 200 and 2.6 U of Expand 
(Boehringer Mannheim). The LI 03 primer corresponds to the sequence 347-405 
of the human MAdCAM-1. An aliquot of 2.5 |il of the PGR reaction was 
reamplified for 25 cycles using the L203 primer, and the nested primer L50- 
(SEQ ID NO:27) (5'-GCTGGT CCGGGAAGGCGTACACAA GGAGCTGC-3') 
corresponding to nucleotides 32 1 -352, with the same annealing temperature. The 
PGR product was ethanol precipitated and ligated into an EcoRV digested, Taq 
polymerase 3' dTTP-tailed pBluescript vector, and sequenced. 
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Northern blot analysis 

For northern analysis, MTN (Clontech) filters were screened with the 
insert of the MAdCAM-1 EST clone labeled by random hexanucleotide priming. 
The conditions of hybridization were 1% SDS, 2 x SSC, 10% (w/v) dextran 
sulphate, 100 jig/ml denatured salmon sperm DNA, and 50% (v/v) deionized 
formamide at 50°C. Filters were washed twice in 0.1 x SSC, 0.1% SDS at 50^C 
for 30 min. and autoradiographed using XAR-5 film and Cronex Lightning Plus 
screens. 

Results and discussion 

A database of human ESTs was searched for homologs of mouse 
MAdCAM-1 by using the BLAST algorithm (Altschul, S. F., et at. 1 Mol Biol 
275:403-410 (1990)). Partial overlapping MAdCAM-1 cDNA clones 
HEBBC23X and HEBBC23 Y were initially identified from an early stage human 
brain cDNA library (Figure 8A). They were sequenced on both strands and 
together encoded the MAdCAM4 sequence from a position corresponding to 
amino acid residue 89 of the mouse MAdCAM-1 cDNA clone pMAd-?, to the 
end of the 3 '-untranslated region. HEBBC23Y and X encoded from nucleotide 
positions 273 to 858, and 544 to 1536, respectively of the human MAdCAM-1 
sequence. In order to obtain the missing 5 '-end sequence, the early stage brain 
library was rescreened, as well as five other brain, pancreatic, and adult and fetal 
spleen cDNA libraries, but no clones that extended the sequence were obtained. 
As an alternative approach, fetal brain mRNA was subjected to rapid 
amplification of cDNA ends (RACE), but despite exhaustive attempts the 
MAdCAM-1 5 '-sequence remained elusive. As a last resort 100,000 colonies of 
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a genomic library in the cosmid vector, pCV007, (Choo, K. H., et al, Gene 46: 
277 (1986)) were screened with the MAdCAM-1 EST cDNA clone (see 
Example 6). Of several clones isolated, one strongly hybridizing clone, 
MAD-Cl , was characterized and found to contain the missing sequence on a 5 kb 
Sac I-Sac I fragment. Continuity between the cosmid and cDNA sequences was 
established by RT-PCR from fetal brain RNA using a sense primer U203 to 
putative genomic MAdCAM-1 5 '-untranslated and signal peptide sequence, and 
nested antisense primers L50 and L103 to the 5'-end of the EST clone (see 
Methods and Example 6). 

The composite nucleotide and deduced amino acid sequences of the 
MAdCAM-1 HEBBC23X cDNA clone, the genomic clone MAD-Cl, and the 5'- 
PCR product are given in Fig. 8. The nucleotide sequence of 1 546 bp ends with 
the polyadenylation signal AAATAAA (SEQ ID NO:28), followed 15 bases 
further by a poly(A) stretch. Ten bp of the 5 '-untranslated sequence has been 
added for completeness. The open reading frame beginning with an ATG at 
position 1 encodes a protein of 382 amino acid residues. The ATG start codon, 
which is flanked by the consensus sequence Pur XXAUG Pur (SEQ ID NO:29), 
is followed by a predominantly hydrophobic segment of 18 amino acid residues 
characteristic of a signal peptide. A hydropathicity plot of the deduced amino 
acid sequence (Fig. 7) revealed a sequence presumed to be the transmembrane 
domain, encompassing residues 320 to 339. Thus, the sequence predicts a 
transmembrane bound protein comprised of a predominantly hydrophilic 103 
amino acid extracellular domain, a 20 amino acid transmembrane segment, and 
a 43 amino acid cytoplasmic domain, with an Mr of 38,340. There is a single 
potential N-linked glycosylation site at amino acid position 83, 

The deduced amino acid sequence revealed a 17 amino acid signal 
peptide, two inununoglobulin (Ig)-line domains, an 86 amino acid mucin-like 
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region rich in serine/threonine residues, a 20 amino acid transmembrane domain, 
and a 43 amino acid charged cytoplasmic domain. The sequences of the two N- 
terminal Ig-like domains are highly conserved (59-65%) with the conesponding 
receptor-binding Ig domains of mouse MAdCAM-1 . No counterpart to the third 
IgA-like domain of mouse l^dCAM-1 was present, and instead the 
serine/threonine-rich mucin domain has been extended as two distinguishable 
regions, here designated the major and minor mucin domains. The major domain 
is formed from six tandem repeats of an eight amino acid sequence having the 
consensus DTTSPEP/SP (SEQ ID NO:30), which is similar to the imperfect 
repeats of the intestinal mucin MUC-2. The mucin domains of the MAdCAM-1 
human/mouse species homologs are distinct, in accord with the notion that mucin 
domains are not phylogenetically conserved. Human MAdCAM-1 raRNA 
transcripts were restricted to small intestine, colon, spleen, pancreas, and brain 
which is a further indication that the clones encode MAdCAM-1. Alternatively 
spliced N4AdCAM-l variants were identified that lack all or part of the second Ig 
domain, and all or part of the major mucin domain, indicating that the function 
of this vascular addressin might be regulated by extensive modifications to its 
multidomain structure. 

The extracellular domain comprises two Ig-like domains of 52 and 69 
amino acid residues, respectively, each possessing the invariant cysteine residues 
that stabUize the immunoglobulin loop; with the first domain having doublet 
cysteines. There is a mucin-like 48 amino acid residue domain encompassing 
residues 226-273, vAach is rich in serine, threonine, and proline residues (71%). 
The mucin domain is formed from six tandem repeats of an eight amino acid 
sequence having the general consensus DTTSPEP/SP (SEQ ID NO:30). The 
repeats are highly conserved with one another (75-100%), suggesting that they 
arose by duplication. This domain has 19 potential sites for 0-linked 
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glycosylation. The mucin-Iike nature of the region extends to a lesser degree as 
far as the transmembrane domain, since the serine/threonine/proline content is 
still quite high (43%). We designate this latter region (positions 278 to 31 1) as 
the minor mucin domain, and the mucin tandem repeats immediately 5' as the 
major mucin domain. A search of the NBRF database revealed that human 
MAdCAM-1 was most similar to mouse MAdCAM-1, but striking homologies 
were also identified with VCAM-1 , and ICAM-1 . Alignment of the human and 
mouse sequences (not shown) revealed an overall weak similarity of 39%. 
However, Ig domains 1 and 2 in particular have been highly conserved, 59 and 
65%, respectively; and similarity increases to 69 and 81%, respectively, when 
conservative substitutions are included. This is to be expected since these two Ig 
domains interact to support binding to the LPAM-1 receptor, and both domains 
are required for full function. The membrane-proximal regions of the 
extracellular domains of human and mouse MAdCAM-1 are peptide backbones 
designed for decoration with complex O-linked carbohydrate moieties for 
presentation to L-selectin, and as such, only the serine/threonine/proline content 
needs to be conserved. Hence, after the first mucin repeat there is little similarity 
between the hmnan and mouse sequences, except for transmembrane domain 
which is 55% identical. The short charged cytoplasmic domains share only 35% 
identity, and the human sequence extends 24 amino acid residues further than the 
mouse sequence. Clone HEBBC23X lacks an equivalent of the third Ig domain 
of mouse MAdCAM-1. A truncated mouse MAdCAM-1 variant has been 
identified in which exon 4 is spliced out removing both the mucin domain and the 
third Ig domain (Sampaio et al, 1 Immunol 155: 2477-86 (1995)), The third Ig 
domain of mouse MAdCAM-1 is strikingly similar to the Col constant region 
immunoglobulin loop of human and gorilla IgAl (Briskin et aL Nature 363:461- 
64 (1993)). It was suggested that it may be able to interact with IgA-specific Fca 
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receptors or related surface receptors on mucosal T cells, given that the Ca2 
constant regions mediates IgA interactions with the poly-immunoglobulin Fc 
receptor. It remains plausible that an Ig domain with a mucosal function is not 
needed in the human brain, and that a human counterpart to the three Ig domain 
form cloned from the mouse hiN4Ad-4 brain endothelioma cell line might be 
expressed in human PP or mesenteric venules. Completion of the sequence 
analysis of the MAdCAM-1 cosmid clone should resolve this point. 

Human MAdCAM-1 may have compensated for a lack of a third Ig 
domain by having two mucin domains to hold the two N-terminal ligand-binding 
domains above the glycocalyx for presentation to LPAM-1 . In mouse there are 
108 amino acid residues separating the mucin domain from the transmembrane 
domain compared to only 46 residues separating the major mucin domain from 
the transmembrane domain in human MAdCAM-1 . The distances may not be so 
dissimilar given that the third Ig domain of mouse MAdCAM-1 is a loop 
structure, whereas the extended mucin domain in human MAdCAM-1 is probably 
rod-like as are the mucin repeats of MUC-1. (Fontenot et al, Cancer Res. 53: 
5386-94 (1993)). The repeats in the major mucin domain may have been 
inserted, possibly by a gene conversion event involving a mucin gene, to enrich 
the overall content of serine/threonine residues (40% in major domain) and to 
enable better presentation to L-selectin by positioning the major mucin repeat 

above the glycocalyx. 

A search of the NBRF database with the sequence of the tandem repeats 
of the major mucin domain revealed most similarity (up to 62% including 
conservative substitutions) with a region of imperfect repeats in the human 
intestinal mucin MUC-2. MUC-2 contains two distinct regions with a high 
degree of internal homology. (Toribara et al, J. Clin. Invest. 88: 1005-13 
(1991)). There is a region of imperfect repeats that range from 7 to 40 amino 
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acids, with the most common length being 16 amino acids. This 385 amino acid 
region has a high threonine (47.8%), proHne (35.6%) and serine (10.6%) content. 
It is this region to which MAdCAM-1 shares similarity (Fig. 2). The major 
MAdCAM-1 tandem repeat domain is not as rich in such residues, and 22% of 
the dissimilar amino acids are acidic residues which are totally absent from the 
imperfect repeats of MUC-2. In MUC-2 there is also a 3* region composed of 69 
bp tandem repeats arranged in an array of up to 1 1 5 units, which is not similar to 
the MAdCAM-1 mucin region despite having a high serine/threonine/proline 
content (87%). (Zrihan-licht et ai, Eur J. Biochem 224: 787-95 (1994)). The 
human intestinal mucin MUC-1 has a serine/threonine/proline-rich 20 amino acid 
residue domain (SEQ ID N0:31) PDTRPAPGSTAPPAHGVTSA, repeated up 
to 200 times, (Gum etal, J, Biol. Chem. 266: 22733-38 (1991)) and rat intestinal 
mucin has the repeat sequence (SEQ ID NO:32) TTTPDV, (Spicer et aL J. Biol 
Chem. 266: 15099-109 (1991)) but neither of these sequences bear similarly to 
MAdCAM-1. Repetitive portions of intestinal mucin genes are not well 
conserved phylogenetically, and this may explain die divergence of the human 
and mouse MAdCAM-1 3' sequences. (Vos et ai, Biochem. Biophys: Res. 
Commun, 181: 121-30 (1991); Shimizu & Shaw, Nature 366: 630-31 (1993)). 
Thus the primary function of the MAdCAM-1 mucin repeats is probably purely 
to provide a framework for extensive 0-linked glycosylation. 

MAdCAM-1 clone HEBBC23Y appears to be a splice variant in the 3 
mucin repeats are missing (amino acid residues 231-254) (Figs. 8 A, 10). In 
order to detemiine whether additional mucin domain splice variants might exist, 
MAdCAM-1 transcripts were amplified from human fetal brain using sense and 
antisense PGR primers designed to the start of Ig domain 2 and the cytoplasmic 
domain, respectively. Several novel splice variants were identified including one 
which lacked almost all of the second Ig domain and all the major mucin repeats; 
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and two others which had lost half of Ig domain 2 and 2 to 3 mucin repeats (Fig. 
lOA). Several of these alternatively spliced transcripts could be accommodated 
in the broad band seen on northerns, whereas those with larger deletions may be 
more weakly expressed and visible as a faint leader smear, as is the case for the 
alternatively spliced variant of mouse MAdCAM-1 (Sampaio et al, J. Immunol 
155: 2477-86 (1995)). None of the splice sites correlate with exon/intron 
boundaries identified in the mouse MAdCAM-1 gene, and hence they probably 
represent internal splice donor and acceptor sites within the respective exons (Fig. 
lOB). Alternative splicing of human MAdCAM-1 is in accord with alternative 
splicing of its mouse homologue. A proposed single Ig domain form (Fig. 1 1) 
containing just Ig domain 1 is interesting since analysis of the structural 
requirements for mouse MADCAM-1 ligand-binding revealed both N-tenninal 
Ig domains were required for full fimction. Nevertheless a mouse MAdCAM-1 
chimeric molecule lacking Ig domain 2 could bind to LPAM-1 (to a lesser 
extent), but only after integrin activation. The proposed naturally occurring Ig 
domain 2-deficient form of MAdCAM-1, identified in this report, may prove to 
be specialized to be more sensitive to the activation/inactivation status of 
LPAM-1. 

The regulation of mucin adhesion by alternative splicing is well 
established. (Thomas, M. L., Am Rev. Immunol. 7: 339-69 (1989)). The 
leukocyte common antigen family, for instance, is generated by alternative 
splicing of three exons encoding a mucin-like region. (Berg et al., Cellular and 
Molecular Mechanisms of Inflamrhation 2:1 1 1-29 (1991)). The MAdCAM-1 
variants described in this report possess fi-om 0 to 6 mucin repeats (Fig. 1 1), and 
might be expected to vary in their affinity for L-selectin. Whether there is a 
spatio/temporal patterns or stochastic expression of alternatively spliced forms 
of MAdCAM-1 on the surface of venules remains to be determined. 
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Multiple Human Tissue mRNA Northern blots (MTN and MTNII. 
Clontech) were probed with the cDNA clone HEBBC23X revealing a transcript 
of 1.6 kb, expressed very strongly in small intestine, less strongly in colon and 
spleen, and very weakly in pancreas and brain. These results are consistent with 
northern and immunohistological studies of mouse MAdCAM-1, which revealed 
expression in PP, MLN, at low levels in PLN, and some expression in the 
marginal sinus around splenic white pulp nodules in the spleen. (Briskin et al. 
Nature 363:461-64 (1993); Kraal et al, Am. 1 path 147: m-lll (1995). 

In summary, several features of mouse MAdCAM-1 have been stringently 
conserved in humans. This includes the tissue distribution of human 
MAdCAM-1, and the structure of the two Ig ligand-binding domains; yet the 
3'-region is quite divergent. In accord with the regulation of other mucins and 
IgCAMs, the function of human MAdCAM-1 is likely to be regulated by 
extensive alternative splicing as evidenced by the variant forms described herein. 

Example 6: Genomic Organization and Mapping oftlte Human MAdCAM-l 
Gene; Analysis of the 5 '-Promoter Region 

Materials and Methods 

Isolation of MAdCAM^l cosmid and genomic phage clones 

The two human genomic libraries screened were a Stratagene 1 FIX II 
library prepared from human placenta genomic DN A digested with Mbol, and a 
cosmid library constructed in the vector pAVCV007 from DNA partially 
digested with Mbo I. The cosmid library was replica plated onto Gene-Screen 
Pius fillers (Du Pont, Boston, MA), and screened with the Xho I-EcoR I 
32P-labeled 500 bp insert of the MAdCAM-1 cDNA clone PGR Y. 
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Positive clones HEBBC2359Land GM3 were isolated from the phage and cosmid 
libraries, respectively. 

Subcloning of restriction fragments and sequence determination 

Restriction en2yme fi^ents of the genomic clones were subcloned into 
pBluescript and sequenced with a panel of oligonucleotide primers designed to 
the MAdCAM-1 cDNA sequence. DNA sequence was determined by cycle 
sequencing using an Applied Biosystems 373A automated DNA sequencer 
(School of Biological Sciences, University of Auckland, Auckland). The entire 
transcribed regions of the MAdCAM-1 gene, previously defined by the 
MAdCAM-1 cDNA, were identified and sequenced. Exon-intron boundaries 
were assigned by direct comparison of the cDNA and genomic sequences, and 
according to the GT/AG rule for splicing. The determined DNA sequence has 
been submitted to GenBank databank. 

Chromosomal mapping of the human MAdCAM-l gene 

A combination of PGR analysis of a panel of human/rodent somatic cell 
hybrids and fluorescence in situ hybridization (FISH) to human metaphase 
chromosomes was used to define the chromosomal location of the MAdCAM-1 
gene. Fourteen of the cell hybrids contained a single human chromosome, 
whereas the remaining 10 contained 2 to 3 chromosomes, or 1 to 3 chromosomal 
fragments. Two primers U707 and L1072 were designed to nucleotide positions 
978-999 (SEQ ID NO:39) (TGC GGT GOT GGG ACT GOT GOT C, sense) and 
1344-1364 (SEQ ID NO:40) (TCA GGG AGG GGC TTC AGG TCA, antisense) 
of the MAdCAM-1 cDNA sequence, respectively. They amplified a PGR 
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product of 386 bp from human DNA, but not from mouse or hamster DNA. The 
PGR conditions were: 5 min at 95*C, followed by 30 cycles of 94 'C for 30 s, 
60°C for 30 s, ll^'C for 30 s, and a final extension at 72'C for 5 min. 

The precise regional localization of the MAdCAM-1 gene was detemiined 
by single copy gene fluorescence in situ hybridization (FISH) to human male 
metaphase chromosome spreads. Briefly, a 1 .3 kb MAdCAM-1 cDNA was nick- 
translated using digoxygenin 1 1-dUTP (Boehringer Mannheim), and FISH was 
carried out. Individual chromosomes were counterstamed with 4 -6-diamidino-2- 
phenyindole-2HCl (DAPI). Color digital images containing both DAPI bands 
and gene signal detected with anti-digoxygenin-tagged rhodamine fluorescent 
label were recorded using a triple-band pass filter set (Chroma Technology, Inc., 
Brattleburo, VT) in combination with a charged coupled-device camera 
(Photometries, Inc., Tucson, AZ) and variable excitation wave length filters. 
Images were analyzed using the ISEE software package (Inovision Corp., 
Durham, N.C.). 

Construction of human MAdCAM-l-luciferase fusion genes for assays of 
promoter activity 

A 700 base pair fragment encoding a region immediately 5' of the 
MAdCAM-1 gene and including the translational start site was PGR amplified 
from a Sac I-Pst I subclone of the cosmid clone pGM3 using the T7 forward 
primer (SEQ ID N0:41) (5'-GTA ATA CGA CTC ACT ATA GG-3'; sense) and 
the MAdCAM-l-specific antisense primer MAD-2 (SEQ ID NO:42) (5'-AGG 
GCC AGT CCG AAA TCC ATG CTC AGT CCC-3*). The PGR product was 
subcloned into the EcoRV site in pBluescript, excised with Hind III and 
subcloned into the pGL-2 Basic vector (Promega, Madison, WI) which contains 
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a firefly luciferase reporter gene. The insert of the clone created, pGL-2/B-718, 
was sequenced, confirming that no PGR errors had been incorporated. 

Genomic organisation of the human MAdCAM-l gene 

In order to isolate the MAdCAM-1 gene, 200,000 colonies of a genomic 
library in the cosmid vector, pAVCVOO?, were screened with the MAdCAM-1 
cDN A clone PGR Y that encodes from nucleotide positions 273 to 858. Of two 
clones obtained, the longest, GM3, contained the entire gene, and 5'-untranslated 
region, but did not contain exons encoding the transmembrane and cytoplasmic 
domains, and 3*-untranslated region. The missing portion of the MAdGAM-1 
gene was located on clone HEBBC23592, isolated by screening plaques from a 
FIX II genomic library with a 1.3 kb MAdGAM-1 cDNA probe. Southern blot, 
PGR, and DNA sequence analysis demonstrated that clone HEBBG23592 
contained at least exons 3 to 5 of the MAdGAM-1 gene. 

DNA sequencing revealed that the coding portion of the MAdCAM-1 
gene is contained within 5 exons, with the sequences being identical to the 
MAdGAM-1 cDNA sequence. All intron-exon splice junction sequences are in 
agreement with the GT/AG rule for splicing. The introns are all type I, where 
interruption occurs after the first nucleotide of a codon. The first exon (52 bp) 
encodes the signal peptide and 5'-untranslated sequence; exons 2 and 3 encode 
the N-terminal Ig domains; exon 4 encodes the mucin domain; and exon 5 
encodes the transmembrane and cytoplasmic domains, and the 3* untranslated 
region. 
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Comparison of the human and mouse MAdCAM-l genes 

Alignment of the human and mouse MAdCAM-1 gene sequences 
revealed that three of four intron-exon junctions separating the signal peptide and 
Ig domain sequences were conserved in position. TheMAdCAM-1 mucin-like 
domains are not conserved betvy^een species, and the exon-intron splice sites 
separating the mucin and transmembrane domain sequences are also not 
conserved. In humans, the splice site is nine amino acids N-terminal to the 
boundary of the extracellular and transmembrane domains, whereas in mouse it 
is three amino acids N-terminal to that boundary. 

A splice variant of human MAdCAM-l lacks exon 4 encoding the mucin 
domain 

Splice variants of human MAdCAM-1, where the variant forms lack all 
or part of the second Ig domain, and all or part of the major mucin domain, are 
described above in Example 5. Comparison with the MAdCAM-1 genomic 
sequence confirms that all four splice variants were derived by internal splicing 
of exons, unlike the single splice variant identified for mouse N4AdCAM-l which 
is formed by splicing out exon 4, which encodes the mucin/IgA-like Ig domain. 
Further splice variants of 250 (minor), 350 (major), and 500 (minor) bp in size, 
compared to a full-length PGR product of 700 bp, were amplified from human 
fetal brain. Shotgun subcloning and sequencing revealed an equivalent of the 
mouse exon 4 splice variant, encoded by 340 bp of DN A. Comparison v^th the 
genomic sequence reveals that this new MAdCAM-1 variant is created by 
altemative splicing and deletion of exon 4, which encodes the entire mucin-like 
domain. 
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Analysis ofUte S'-flanking region of the MAdCAM-1 gene 

A 700 bp 5'-flanking region of the MAdCAM-1 gene was sequenced, 
revealing several potential transcriptional regulatory elements. These include 
two tandem NF-kB binding sites at positions -98 and -1 10 with respect to the 

5 translational start codon; thirteen SP-1 sites at -66, -141, -157, -164, -177, -189, 

-308,-322, -338,-590,-647,-664,-678 ; nine AP-2 sites at -66,-157,204, 
-325, -544, -549, -694, -591, -204; PEA3 (ets family) sites at -115, -212; an 
NF-El site at -522; Adhl (ETF) sites at -95, -1 87; a GC box at -176; a MyoD 
site at -582; an E2A site at -85; an ENKCRE (SEQ ID NO:43) site at -496; and 

10 an IRS site at -354. Only the tandem NF-kB sites, the SP-1 site at -590, and a 

potential TATA box (TATTTAA; at position -38) (SEQ ID NO :44) identified 
in the mouse promoter are conserved in position (Fig. 13). Despite this, the 367 
bp promoter region immediately flanking the MAdCAM-1 gene is highly 
conserved (79 %) with the corresponding region of the mouse promoter 

15 (Fig. 13). 

The pGL-2/B-71 8+ and pGL-2/B-71 8" constructs which contain a 700 bp 
fragment of the MAdCAM-1 gene 5'-flanking sequence (nucleotide positions 
-718 to +20 relative to the translational start) fused to the luciferase reporter 
gene (Figs.MA) were used in transient transfection assays to test for promoter 

20 activitj'. Promoter activity was tested in PMA-treated and untreated HMEC cells, 

a human dermal endothelial cell line which consitutively produces MAdCAM-1 
RNA (Fig.l4B). The reporter construct directed a low but consistent level of 
luciferase activity in unactivated cells as compared to the pGL-2/B basic control 
vector, and the control pGL-2/-718- vector containing the promoter in the 

25 incorrect orientation. The activity of the pGL-2/B-7 18+ vector was doubled 
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following cell stimulation with PMA, in comparison to the pGL-2/-718' vector 
control (Fig.MC). 

Chromosomal assignment of the human MAdCAM-1 gene 

Genomic DNAs from a panel of 24 hxmian-rodent somatic cell hybrids, 
the majority of which were monochromosomal, were analyzed by PCR using 
PGR primers directed to the MAdCAM-1 sequence. The expected 386 bp PCR 
fragment was specifically amplified fi-om human DNA, but not fi*om mouse or 
hamster DNAs, and was specifically obtained from a hybrid cell line (GM10612) 
containing only human chromosome 1 9. 

The MAdCAM-1 gene was regionally localized to chromosome 19 by in 
situ hybridisation of metaphase chromosomal spreads with the 1 .3 kb cDNA 
insert of MAdCAM-1 cDNA clone HEBBC23X (see Example 5). Approximately 
thirteen spreads were analyzed by eye, most of which had a doublet signal 
characteristic of genuine hybridization on at least one chromosome 1 9. Doublet 
signals were not detected on any other chromosome. Detailed analysis of 12 
individual chromosomes, using fluorescence banding cimbined with high 
resolution image analysis, indicated that the MAdCAM-1 gene is positioned 
within band 19pi3.3. 

Discussion 

The genomic organization of the MAdCAM-1 gene correlates well with 
the subdomain structure of the encoded protein. The 5'-untranslated region and 
signal peptide are encoded by exon 1, the two N-terminal Ig domains and mucin- 
like domain are encoded by exons 2, 3, and 4, respectively, and the 
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transmembrane and cytoplasmic domains and 3'-untranslated region are combined 
together on exon 5. Several features of MAdCAM-1 have been conserved 
between humans and mice, including the structure of the two Ig ligand-binding 
domains, yet the 3'-region is quite divergent. Comparison of the human gene 
sequence with the mouse homologue revealed that differences in organization of 
the 3'-region are not simply due to alternative splicing, but are inherent in the 
genomic DNA. Thus the human MAdCAM-1 gene contains no sequence 
equivalent to the third IgA-homologous domain of mouse MAdCAM-1 adjoining 
the 3'-end of the mucin domain. It is possible that a third Ig domain exists as a 
separate exon in the large intron separating exons 4 and 5, but given all the 
available evidence, and in particular sequence analysis of MAdCAM-1 splice 
variants from RT-PCR analysis, this seems unlikely. Despite this major 
difference in gross structure other regions of human and mouse MAdCAM-1 are 
highly conserved, including the positions of four of the five intron-exon sphce 
junctions, highlighting the close evolutionary relationship between the molecules. 

Four splice variants were identified by RT-PCT that lacked all or part of 
the second Ig domain, and all or part of the major mucin domain. Comparison 
with the genomic sequence reveals that all the variants arose by internal splicing 
of exons. Intra-exonic splicing of MAdCAM-1 is further substantiated by the fact 
that our original MAdCAM-1 cDNA clone HEBBC23X has only six major mucin 
repeats, whereas a human MAdCAM-1 clone isolated from a mesenteric lymph 
node library contained eight such repeats. A MAdCAM-1 variant containing just 
six repeats has also been mdependently isolated by RT-PCR. It was therefore of 
interest to determine that the total possible number of repeats in the major mucm 
domain, contained within exon 4 of the MAdCAM-1 gene, is in fact eight. The 
regulation of mucin adhesion by alternative splicing is well established, and 
MAdCAM-1 appears to be no exception. The human MAdCAM-1 variant 
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created by the splicing out of exon 4 encoding the mucin domain (described 
above in Example 5) is a counterpart to the splice variant identified in mouse 
MAdCAM-1 which lacks exon 4, encoding the mucin and third IgA-like domain. 
Despite the prominence of the splice variants identified by PCR, they are not 
abundant in Northerns. Nevertheless it will be important to study the 
topographical and tissue distribution of the various MAdCAM-1 splice variants, 
given that absence or truncations of the mucin domain will affect the ability of 
MAdCAM-1 to facilitate lymphocyte tethering under flow to L-selectin. 

Sequence analysis of the 5 -region of the human MAdCAM-1 gene 
revealed close similarity to the mouse MAdCAM-1 gene promoter. The two 
tandem NF-kB sites located 100 bp upstream of the start site of transcription in 
the mouse promoter are conserved in position, Transfection assays in the murine 
endothelial cell line bEnd.3, carried out with promoter mutants of the mouse 
MAdCAM-1 gene, revealed that occupancy of both NF-kB sites is essential for 
the promoter to drive expression in response to TNF-a. The 5' NF-kB site is 
totally conserved in sequence with the mouse counterpart, whereas the 3' site is 
only slightly divergent NF-kB is also involved in the increased expression of 
VCAM-1 and ICAM-1 by LPS, TNF-a and IL-1 p. In contrast, binding sites for 
TGF-p-inducible transcription factors (NFl and API), previously identified in the 
mouse promoter, were not present. Multiple AP-2 sites in addition to the NF-kB 
sites may be responsible for the increased activity of the promoter in response to 
PMA. The presence of a MyoD site (CACCTG) (SEQ ID NO:45), which is 
found within the muscle creatine kinase enhancer, is interesting, given that the 
related VCAM-1 is expressed on myoblasts and myotubes in culture and in vivo 
at sites of secondary myogenesis. 
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FISH and PGR analysis of a panel of human-rodent somatic cell hybrids 
was used to localize the MAdCAM-1 gene to chromosome 19, band pl3.3. It is 
notable that the human ICAM-1 arid ICAM-3 genes are located in close proximity 
(19pl3.2-pl3.3), raising the possibility that the MAdCAM-1, ICAM-1 and 
ICAM-3 genes are clustered together on the short arm of chromosome 19. This 
region is homologous to a region on mouse chromosome 10, and it is interesting 
therefore that the mouse MAdCAM-1 gene is located on chromosome 10. Yet 
another member of the immunoglobulin superfamily which is ubiquitously 
expressed in various tissues, termed basigin, also maps to this same region. In 
contrast VCAM-1 and ICAM-2 are located on chromosomes 1 and 17, 
respectively. Given that the MAdCAM-1 mucin-like domain is decorated with 
carbohydrate moieties recognized by L-selectin, it is interesting to note that a 
cluster of three (FUT6-FUT3-FUT5) of five cloned human fiicosyltransferase 
genes responsible for the synthesis of sialyl Lewis x and a, and related 
fiicosylated antigens recognised by selectins, is located on 19pl3.3. In terms of 
cancer, band 19pl3.3 is frequently involved in structural anomalies of 
chromosome 19, associated with ovarian cancer, leukemia, and multiple 
myeloma. Genes at 19pl3.3 which have so far been shown to be involved 
include the insulin receptor, E2A transcription factor, and MLLTl genes. 



wo 98/20110 



PCT/US96/17549 



.87- 

SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: HUMAN GENOME SCIENCES, INC. 

9410 KEY WEST AVENUE 
ROCKVILLE, MD 20850 
UNITED STATES OP AMERICA 

THE UNIVERSITY OF AUCKLAND 
85 PARK ROAD, GRAFTON 
AUCKLAND, NEW ZEALAND 



APPLICANTS /INVENTORS: NI, JIAN 

GREENE, JOHN M. 
KRISSANSEN, GEOFFREY W 
LEUNG, EUPHEMIA YEE FUN 
RUBEN, STEVEN M. 

(ii) TITLE OP INVENTION: HUMAN MUCOSAL ADDRESSIN CELL ADHESION 
MOLECULE- 1 (MAdCAM-1) AND SPLICE VARIANTS THEREOF 

(iii) NUMBER OF SEQUENCES: 59 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: STERNE, KESSLER, GOLDSTEIN & FOX P.L.L.C. 

(B) STREET: 1100 NEW YORK AVENUE, N.W. SUITE 600 

(C) CITY: WASHINGTON 

(D) STATE: D.C. 

(E) COUNTRY: US 

(F) ZIP: 20005 

(V) COMPXJTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: TBA 

(B) FILING DATE: HEREWITH 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: GOLDSTEIN, JORGE A. 

(B) REGISTRATION NUMBER: 29,021 

(C) REFERENCE /DOCKET NUMBER: 1488 . O57PC00/ JAG/EKS/LLK 

(ix) TELECOMMUNICATION INFORMATION: 
{A; TELEPHONE: 202-371-2600 
(B) TELEFAX: 202-371-2540 



wo 98/20110 



PCT/US96/17549 



-88- 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1536 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1146 



(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 52.. 1146 

(ix) FEATURE: 

(A) NAME/KEY: sig^peptide 

(B) LOCATION: 1..49 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:l3 



48 



ATG GAT TTC GGA CTG GCC CTC CTG CTG GCG GGG CTT CTG GGG CTC CTC 
Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 -10 

CTC GGC CAG TCC CTC CAG GTG AAG CCC CTG CAG GTG GAG CCC CCG GAG 96 
Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
1 5 10 15 

CCG GTG GTG GCC GTG GCC TTG GGC GCC TCG CGC CAG CTC ACC TGC CGC 144 
Pro val val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

CTG GCC TGC GCG GAC CGC GGG GCC TCG GTG CAG TGG CGG GGC CTG GAC 192 
Leu Ala cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

ACC AGC CTG GGC GCG GTG CAG TCG GAC ACG GGC CGC AGC GTC CTC ACC 240 
Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

GTG CGC AAC GCC TCG CTG TCG GCG GCC GGG ACC CGC GTG TGC GTG GGC 288 
val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

TCC TGC GGG GGC CGC ACC TTC CAG CAC ACC GTG CAG CTC CTT GTG TAC 336 
Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

GCC TTC CCG GAC CAG CTG ACC GTC TCC CCA GCA GCC CTG GTG CCT GGT 384 
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Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

GAC CCG GAG GTG GCC TGT ACG GCC CAC AAA GTC ACG CCC GTG GAC CCC 432 
Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

AAC GCG CTC TCC TTC TCC CTG CTC GTC GGG GGC CAG GAA CTG GAG GGG 480 
Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

GCG CAA GCC CTG GGC CCG GAG GTG CAG GAG GAG GAG GAG GAG CCC CAG 528 
Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Glu Glu Glu Glu Pro Gin 
145 150 155 

GGG GAC GAG GAC GTG CTG TTC AGG GTG ACA GAG CGC TGG CGG CTG CCG 576 
Gly Asp Glu Asp Val Leu Phe Arg Val Thr Glu Arg Trp Arg Leu Pro 
160 165 170 175 

CCC CTG GGG ACC CCT GTC CCG CCC GCC CTC TAC TGC CAG GCC ACG ATG 624 
Pro Leu Gly Thr Pro Val Pro Pro Ala Leu Tyr Cys Gin Ala Thr Met 
180 185 190 

AGG CTG CCT GGC TTG GAG CTC AGC CAC CGC CAG GCC ATC CCC GTC CTG 672 
Arg Leu Pro Gly Leu Glu Leu Ser His Arg Gin Ala lie Pro Val Leu 
195 200 205 

CAC AGC CCG ACC TCC CCG GAG CCT CCC GAC ACC ACC TCC CCG GAG TCT 720 
His Ser Pro Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Ser 
210 215 220 

CCC GAC ACC ACC TCC CCG GAG TCT CCC GAC ACC ACC TCC CCG GAG CCT 768 
Pro Asp Thr Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro 
225 230 235 

CCC GAC ACC ACC TCC CCG GAG CCT CCC GAC AAG ACC TCC CCG GAG CCC 816 
Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro 
240 245 250 255 

GCC CCC CAG CAG GGC TCC ACA CAC ACC CCC AGG AGC CCA GGC TCC ACC 864 
Ala Pro Gin Gin Gly Ser Thr His Thr Pro Arg Ser Pro Gly Ser Thr 
260 265 270 

AGG ACT CGC CGC CCT GAG ATC TCC CAG GCT GGG CCC ACG CAG GGA GAA 912 
Arg Thr Arg Arg Pro Glu lie Ser Gin Ala Gly Pro Thr Gin Gly Glu 
275 280 285 

GTG ATC CCA ACA GGC TCG TCC AAA CCT GCG GGT GAC CAG CTG CCC GCG 960 
Val lie Pro Thr Gly Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala 
290 295 300 

GCT CTG TGG ACC AGC AGT GCG GTG CTG GGA CTG CTG CTC CTG GCC TTG 1008 
Ala Leu Trp Thr Ser Ser Ala Val Lgu Gly Leu Leu Leu Leu Ala Leu 
305 310 315 
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CCC ACG TAT CAC CTC TOG AAA CGC TGC CGG CAC CTG GCT GAG GAC GAC 1056 
Pro Thr Tyr His Leu Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp 
320 325 330 335 

ACC CAC CCA CCA GCT TCT CTG AGG CTT CTG CCC CAG GTG TCG GCC TGG 1104 
Thr His Pro Pro Ala Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp 
340 345 350 

GCT GGG TTA AGG GGG ACC GGC CAG GTC GGG ATC AGC CCC TCC 1146 
Ala Gly Leu Arg Gly Thr Gly Gin Val Gly He Ser Pro Ser 
355 360 365 

TGAGTGGCCA GCCTTTCCCC CTGTGAAAGC AAAATAGCTT GGACCCCTTC AAGTTGAGAA 1206 

CTGGTCAGGG CAAACCTGCC TCCCATTCTA CTCAAAGTCA TCCCTCTGTT CACAGAGATG 1266 

GATGCATGTT CTGATTGCCT CTTTGGAGAA GCTCATCAGA AACTCAAAAG AAGGCCACTG 1326 

TTTGTCTCAC CTACCCATGA CCTGAAGCCC CTCCCTGAGT GGTCCCCACC TTTCTGGACG 1386 

GAACCACGTA CTTTTTACAT ACATTGATTC ATGTCTCACG TCTCCCTAAA AATGCGTAAG 1446 

ACCAAGCTGT GCCCTGACCA CCCTGGGCCC CTGTCGTCAG GACCTCCTGA GGCTTTGGCA 1506 

AATAAACCTC CTAAAATGAT AAT^AAAAAAA 1536 



(2) INFORMATION FOR SEQ ID N0:2: 

(i) SEQUENCE CH/^CTERISTICS : 

(A) LENGTH: 382 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 -10 -5 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 
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Ser Cys Gly Gly Arg Thr Phe. Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Glu Glu Glu Glu Pro Gin 
145 150 155 

Gly Asp Glu Asp Val Leu Phe Arg Val Thr Glu Arg Trp Arg Leu Pro 
160 165 170 175 

Pro Leu Gly Thr Pro Val Pro Pro Ala Leu Tyr Cys Gin Ala Thr Met 
180 185 190 

Arg Leu Pro Gly Leu Glu Leu Ser His Arg Gin Ala He Pro Val Leu 
195 200 205 

His Ser Pro Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Ser 
210 215 220 

Pro Asp Thr Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro 
225 230 235 

Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro 
240 245 250 255 

Ala Pro Gin Gin Gly Ser Thr His Thr Pro Arg Ser Pro Gly Ser Thr 
260 265 270 

Arg Thr Arg Arg Pro Glu He Ser Gin Ala Gly Pro Thr Gin Gly Glu 
275 280 285 

Val He Pro Thr Gly Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala 
290 295 300 

Ala Leu Trp Thr Ser Ser Ala Val Leu Gly Leu Leu Leu Leu Ala Leu 
305 310 315 

Pro Thr Tyr His Leu Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp 
320 325 330 335 

Thr His Pro Pro Ala Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp 
340 345 350 

Ala Gly Leu Arg Gly Thr Gly Gin Val Gly He Ser Pro Ser 

- •> c 

355 J*'^ 
(2) INFORMATION FOR SEQ ID NO: 3: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1488 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .1098 

(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 52 1098 

(ix) FEATURE: 

(A) NAME/KEY: sig_j)eptide 

(B) LOCATION: 1. .49 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ATG GAT TTC GGA CTG GCC CTC CTG CTG GCG GGG CTT CTG GGG CTC GTC 48 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 

-17 -15 -10 -5 

CTC GGC CAG TCC CTC CAG GTG AAG CCC CTG CAG GTG GAG CCC CCG GAG 96 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

CCG GTG GTG GCC GTG GCC TTG GGC GCC TCG CGC CAG CTC ACC TGC CGC 144 
Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

CTG GCC TGC GCG GAC CGC GGG GCC TCG GTG CAG TGG CGG GGC CTG GAC 192 
Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

ACC AGC CTG GGC GCG GTG CAG TCG GAC ACG GGC CGC AGC GTC CTC ACC 240 
Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

GTG CGC AAC GCC TCG CTG TCG GCG GCC GGG ACC CGC GTG TGC GTG GGC 288 
Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

TCC TGC GGG GGC CGC ACC TTC CAG CAC ACC GTG CAG CTC CTT GTG TAC 336 
Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

GCC TTC CCG GAC CAG CTG ACC GTC TCC CCA GCA GCC CTG GTG CCT GGT 384 
Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 
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GAC CCG GAG GTG GCC TGT ACG GCC CAC AAA GTC ACG CCC GTG GAC CCC 432 
Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

AAC GCG CTC TCC TTC TCC CTG CTC GTC GGG GGC GAG GAA CTG GAG GGG 480 
Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

GCG CAA GCC CTG GGC CCG GAG GTG CAG GAG GAG GAG GAG GAG CCC CAG 528 
Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Glu Glu Glu Glu Pro Gin 
145 150 155 

GGG GAC GAG GAC GTG CTG TTC AGG GTG ACA GAG CGC TGG CGG CTG CCG 576 
Gly Asp Glu Asp Val Leu Phe Arg Val Thr Glu Arg Trp Arg Leu Pro 
160 165 170 175 

CCC CTG GGG ACC CCT GTC CCG CCC GCC CTC TAG TGC CAG GCC ACG ATG 624 
Pro Leu Gly Thr Pro Val Pro Pro Ala Leu Tyr Cys Gin Ala Thr Met 
180 185 190 

AGG CTG CCT GGC TTG GAG CTC AGC CAC CGC CAG GCC ATC CCC GTC CTG 672 
Arg Leu Pro Gly Leu Glu Leu Ser His Arg Gin Ala lie Pro Val Leu 
195 200 205 

CAC AGC CCG ACC TCC CCG GAG TCT CCC GAC ACC ACC TCC CCG GAG CCT 720 
His Ser Pro Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro 

210 215 220 

CCC GAC ACC ACC TCC CCG GAG CCT CCC GAC AAG ACC TCC CCG GAG CCC 768 
Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro 
225 230 . 235 

GCC CCC CAG CAG GGC TCC ACA CAC ACC CCC AGG AGC CCA GGC TCC ACC 816 
Ala Pro Gin Gin Gly Ser Thr His Thr Pro Arg Ser Pro Gly Ser Thr 
240 245 250 255 

AGG ACT CGC CGC CCT GAG ATC TCC CAG GCT GGG CCC ACG CAG GGA GAA 864 
Arg Thr Arg Arg Pro Glu He Ser Gin Ala Gly Pro Thr Gin Gly Glu 
260 265 270 

GTG ATC CCA ACA GGC TCG TCC AAA CCT GCG GGT GAC CAG CTG CCC GCG 912 
Val He Pro Thr Gly Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala 
275 280 285 

GCT CTG TGG ACC AGC AGT GCG GTG CTG GGA CTG CTG CTC CTG GCC TTG 960 
Ala Leu Trp Thr Ser Ser Ala Val Leu Gly Leu Leu Leu Leu Ala Leu 
290 295 300 

CCC ACG TAT CAC CTC TGG AAA CGC TGC CGG CAC CTG GCT GAG GAC GAC 1008 
Pro Thr Tyr His Leu Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp 
305 310 315 



ACC CAC CCA CCA GCT TCT CTU AUG CTT (J1*U CCC 'jiaj vjv.^ 

Thr His Pro Pro Ala Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp 
320 325 330 335 
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GCT GGG TTA AGG GGG ACC GGC CAG GTC GGG ATC AGC CCC TCC 1098 
Ala Gly Leu Arg Gly Thr Gly Gin Val Gly lie Ser Pro Ser 
340 345 

TGAGTGGCCA GCCTTTCCCC CTGTGAAAGC AAAATAGCTT GGACCCCTTC AAGTTGAGAA 1158 

CTGGTCAGGG CAAACCTGCC TCCCATTCTA CTCAAAGTCA TCCCTCTGTT CACAGAGATG 1218 

GATGCATGTT CTGATTGCCT CTTTGGAGAA GCTCATCAGA AACTCAAAAG AAGGCCACTG 1278 

TTTGTCTCAC CTACCCATGA CCTGAAGCCC CTCCCTGAGT GGTCCCCACC TTTCTGGACG 1338 

GAACCACGTA CTTTTTACAT ACATTGATTC ATGTCTCACG TCTCCCTAAA AATGCGTAAG 1398 

ACCAAGCTGT GCCCTGACCA CCCTGGGCCC CTGTCGTCAG GACCTCCTGA GGCTTTGGCA 1458 

AATAAACCTC CTAAAATGAT AAAAAAAAAA 1488 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 366 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 -10 -5 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
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115 120 125 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Glu Glu Glu Glu Pro Gin 
145 150 155 

Gly Asp Glu Asp Val Leu Phe Arg Val Thr Glu Arg Trp Arg Leu Pro 
160 165 170 175 

Pro Leu Gly Thr Pro Val Pro Pro Ala Leu Tyr Cys Gin Ala Thr Met 
180 185 190 

Arg Leu Pro Gly Leu Glu Leu Ser His Arg Gin Ala He Pro Val Leu 
195 200 205 

His Ser Pro Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro 
210 215 220 

Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro 
225 230 235 

Ala Pro Gin Gin Gly Ser Thr His Thr Pro Arg Ser Pro Gly Ser Thr 
240 245 250 255 

Arg Thr Arg Arg Pro Glu He Ser Gin Ala Gly Pro Thr Gin Gly Glu 
260 265 270 

Val He Pro Thr Gly Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala 
275 280 285 

Ala Leu Trp Thr Ser Ser Ala Val Leu Gly Leu Leu Leu Leu Ala Leu 
290 295 300 

Pro Thr Tyr His Leu Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp 
305 310 315 

Thr His Pro Pro Ala Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp 
320 325 330 335 

Ala Gly Leu Arg Gly Thr Gly Gin Val Gly He Ser Pro Ser 
340 345 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1179 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPiU: liNA (genomic) 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .789 

(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 52.. 789 

(ix) FEATURE: 

(A) NAME/KEY: sigjpeptide 

(B) LOCATION: 1..49 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

ATG GAT TTC GGA CTG GCC CTC CTG CTG GCG GGG CTT CTG GGG CTC CTC 48 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 -10 -5 

CTC GGC CAG TCC CTC CAG GTG AAG CCC CTG CAG GTG GAG CCC CCG GAG 96 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

CCG GTG GTG GCC GTG GCC TTG GGC GCC TCG CGC CAG CTC ACC TGC CGC 144 
Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

CTG GCC TGC GCG GAC CGC GGG GCC TCG GTG CAG TGG CGG GGC CTG GAC 192 
Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

ACC AGC CTG GGC GCG GTG CAG TCG GAC ACG GGC CGC AGC GTC CTC ACC 240 
Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

GTG CGC AAC GCC TCG CTG TCG GCG GCC GGG ACC CGC GTG TGC GTG GGC 288 
Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

TCC TGC GGG GGC CGC ACC TTC CAG CAC ACC GTG CAG CTC CTT GTG TAG 336 
Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

GCC TTC CCG GAC CAG CTG ACC GTC TCC CCA GCA GCC CTG GTG CCT GGT 384 
Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

GAC CCG GAG GTG GCC TGT ACG GCC CAC AAA GTC ACG CCC GTG GAC CCC 432 
Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

AAC GCG CTC TCC TTC TCC CTG CTC GTC GGG GGC CAG CAG GGC TCC ACA 480 
Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Gin Gly Ser Thr 
130 135 140 
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CAC ACC CCC AGG AGC CCA GGC TCC ACC AGG ACT CGC CGC CCT GAG ATC 528 
His Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro Glu lie 
145 150 155 

TCC CAG GCT GGG CCC ACG CAG GGA GAA GTG ATC CCA ACA GGC TCG TCC 576 
Ser Gin Ala Gly Pro Thr Gin Gly Glu Val lie Pro Thr Gly Ser Ser 
160 165 170 175 

AAA CCT GCG GGT GAC CAG CTG CCC GCG GCT CTG TGG ACC AGC AGT GCG 624 
Lys Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser Ser Ala 
180 185 190 

GTG CTG GGA CTG CTG CTC CTG GCC TTG CCC ACG TAT CAC CTC TGG AAA 672 
Val Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu Trp Lys 
195 200 205 

CGC TGC CGG CAC CTG GCT GAG GAC GAC ACC CAC CCA CCA GCT TCT CTG 720 
Arg Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala Ser Leu 
210 215 220 

AGG CTT CTG CCC CAG GTG TCG GCC TGG GCT GGG TTA AGG GGG ACC GGC 768 
Arg Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly Thr Gly 
225 230 235 

CAG GTC GGG ATC AGC CCC TCC TGAGTGGCCA GCCTTTCCCC CTGTGAAAGC 819 
Gin Val Gly lie Ser Pro Ser 
240 245 

AAAATAGCTT GGACCCCTTC T^GTTGAGAA CTGGTCAGGG CAAACCTGCC TCCCATTCTA 879 

CTCAAAGTCA TCCCTCTGTT CACAGAGATG GATGCATGTT CTGATTGCCT CTTTGGAGAA 939 

GCTCATCAQA AACTCJ^T^G AAGGCCACTG TTTGTCTCAC CTACCCATGA CCTGAAGCCC 999 

CTCCCTGAGT GGTCCCCACC TTTCTGGACG GAACCACGTA CTTTTTACAT ACATTGATTC 1059 

ATGTCTCACG TGTCCCTAAA AATGCGTAAG ACCAAGCTGT GCCCTGACCA CCCTG6GCCC 1119 

CTGTCGTCAG GACCTCCTGA GGCTTTGGCA AATAAACCTC CTAAAATGAT AAAAAAAATVA 1179 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 263 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 . -10 -5 
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Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
1 5 10 ■ 15 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

Asp Pro Glu Val TQa Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Gin Gly Ser Thr 
130 135 140 

His Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro Glu lie 
145 150 155 

Ser Gin Ala Gly Pro Thr Gin Gly Glu Val He Pro Thr Gly Ser Ser 
160 165 170 175 

Lys Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser Ser Ala 
180 185 190 

Val Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu Trp Lys 
195 200 205 

Arg Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala Ser Leu 
210 215 220 

Arg Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly Thr Gly 
225 230 235 

Gin Val Gly He Ser Pro Ser 
240 245 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1320 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .930 

(ix) FEATURE: 

(A) NAME/KEY: mat_peptide 

(B) LOCATION: 52.. 930 

(ix) FEATURE: 

(A) NAME/KEY: sig_peptide 

(B) LOCATION: 1. .49 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

ATG GAT TTC GGA CTG GCC CTC CTG CTG GCG GGG CTT CTG GGG CTC CTC 48 
Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 -10 -5 

CTC GGC CAG TCC CTC CAG GTG AAG CCC CTG CAG GTG GAG CCC CCG GAG 96 
Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

CCG GTG GTG GCC GTG GCC TTG GGC GCC TCG CGC CAG CTC ACC TGC CGC 144 
Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

CTG GCC TGC GCG GAC CGC GGG GCC TCG GTG CAG TGG CGG GGC CTG GAC 192 
Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
-35 40 45 

ACC AGC CTG GGC GCG GTG CAG TCG GAC ACG GGC CGC AGC GTC CTC ACC 240 
Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

GTG CGC AAC GCC TCG CTG TCG GCG GCC GGG ACC CGC GTG TGC GTG GGC 288 
Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

TCC TGC GGG GGC CGC ACC TTC CAG CAC ACC GTG CAG CTC CTT GTG TAG 336 
Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

GCC TTC CCG GAC CAG CTG ACC GTC TCC CCA GCA GCC CTG GTG CCT GGT 384 
Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

GAC CCG GAG GTG GCC TGT ACG GCC CAC AAA GTC ACG CCC GTG GAC CCC 432 
Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
•tic 120 125 

AAC GCG CTC TCC TTC TCC CTG CTC GTC GGG GGC CAG GAA CTG GAG GGG 480 
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Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

GCG CAA GCC CTG GGC CCG GAG TCT CCC GAC ACC ACC TCC CCG GAG TCT 528 
Ala Gin Ala Leu Gly Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Ser 
145 150 155 

CCC GAC ACC ACC TCC CCG GAG CCT CCC GAC ACC ACC TCC CCG GAG CCT 576 
Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Pro 
160 165 170 175 

CCC GAC AAG ACC TCC CCG GAG CCC GCC CCC CAG CAG GGC TCC ACA CAC 624 
Pro Asp Lys Thr Ser Pro Glu Pro Ala Pro Gin Gin Gly Ser Thr His 
180 185 190 

ACC CCC AGG AGC CCA GGC TCC ACC A6G ACT CGC CGC CCT GAG ATC TCC 672 
Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro Glu He Ser 
195 200 205 

CAG GCT GGG CCC ACG CAG GGA GAA GTG ATC CCA ACA GGC TCG TCC AAA 720 
Gin Ala Gly Pro Thr Gin Gly Glu Val He Pro Thr Gly Ser Ser Lys 
210 215 220 

CCT GCG GGT GAC CAG CTG CCC GCG GCT CTG TGG ACC AGC AGT GCG GTG 768 
Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser Ser Ala Val 
225 230 235 

CTG GGA CTG CTG CTC CTG GCC TTG CCC ACG TAT CAC CTC TGG AAA CGC 816 
Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu Trp Lys Arg 
240 245 250 255 

TGC CGG CAC CTG GCT GAG GAC GAC ACC CAC CCA CCA GCT TCT CTG AGG 864 
Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala Ser Leu Arg 
260 265 270 

CTT CTG CCC CAG GTG TCG GCC TGG GCT GGG TTA AGG GGG ACC GGC CAG 912 
Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly Thr Gly Gin 
275 280 285 

GTC GGG ATC AGC CCC TCC TGAGTGGCCA GCCTTTCCCC CTGTGAAAGC 960 
Val Gly He Ser Pro Ser 

290 

AAAATAGCTT GGACCCCTTC AAGTTGAGAA CTGGTCAGGG CAAACCTGCC TCCCATTCTA 1020 

CTCAAAGTCA TCCCTCTGTT CACAGAGATG GATGCATGTT CTGATTGCCT CTTTGGAGAA 1080 

GCTCATCAGA AACTCAAAAG AAGGCCACTG TTTGTCTCAC CTACCCATGA CCTGAAGCCC 1140 

CTCCCTGAGT GGTCCCCACC TTTCTGGACG GAACCACGTA CTTTTTACAT ACATTGATTC 1200 

ATGTCTCACG TCTCCCTAAA AATGCGTAAG ACCAAGCTGT GCCCTGACCA CCCTGGGCCC 1260 



CTGTCGTCAG GACCTCCTGA GGCTTTGGCA AATAAACCTC CTAAAATGAT AAAAAAAAAA 1320 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
"17 -15 -10 -5 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
15 10 15 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
. 20 25 30 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

Ala Gin Ala Leu Gly Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Ser 
145 150 155 

Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Pro 
160 165 170 175 

Pro Asp Lys Thr Ser Pro Glu Pro Ala Pro Gin Gin Gly Ser Thr His 
180 185 190 

Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro Glu lie Ser 
195 200 205 



Gin Ala Gly Pro Thr Gin Gly Glu Val He Pro Thr Gly Ser Ser Lys 
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210 



215 220 



pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser Ser Ala Val 
225 230 235 

Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu Trp Lys Arg 
240 245 250 255 

Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala Ser Leu Arg 
260 265 270 

Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly Thr Gly Gin 
275 280 285 



Val Gly He Ser Pro Ser 
290 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 132 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: . 

(A) NAME/KEY: CDS 

(B) LOCATION: 1. .939 

(ix) FEATURE: 

(A) NAME/KEY: mat _j)eptide 

(B) LOCATION: 52.. 939 

(ix) FEATURE: 

(A) NAME/KEY: sigjpeptide 

(B) LOCATION: 1..49 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ATG GAT TTC GGA CTG GCC CTC CTG CTG GCG GGG CTT CTG GGG CTC CTC 
Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 -15 



-10 -5 



CTC GGC CAG TCC CTC CAG GTG AAG CCC CTG CAG GTG GAG CCC CCG GAG 
Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
1 5 10 " 



48 



96 



144 
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CTG GCC TGC GCG GAC CGC GGG GCC TCG GTG CAG TGG CGG GGC CTG GAC 192 
Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu T^p 
35 40 45 

ACC AGC CTG GGC GCG GTG CAG TCG GAC ACG GGC CGC AGC GTC CTC ACC 240 
Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

GTG CGC AAC GCC TCG CTG TCG GCG GCC GGG ACC CGC GTG TGC GTG GGC 288 
Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 

TCC TGC GGG GGC CGC ACC TTC CAG CAC ACC GTG CAG CTC CTT GTG TAC 336 
Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

GCC TTC CCG GAC CAG CTG ACC GTC TCC CCA GCA GCC CTG GTG CCT GGT 384 
Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

GAC CCG GAG GTG GCC TGT ACG GCC CAC AAA GTC ACG CCC GTG GAC CCC 432 
Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

AAC GCG CTC TCC TTC TCC CTG CTC GTC GGG GGC CAG GAA CTG GAG GGG 480 
Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

GCG CAA GCC CTG GGC CCG GAG GTG CAG GAG TCT CCC GAC ACC ACC TCC 528 
Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Ser Pro Asp Thr Thr Ser 
145 150 155 

CCG GAG TCT CCC GAC ACC ACC TCC CCG GAG CCT CCC GAC ACC ACC TCC 576 
Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser 
160 165 170 175 

CCG GAG CCT CCC GAC AAG ACC TCC CCG GAG CCC GCC CCC CAG CAG GGC 624 
Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro Ala Pro Gin Gin Gly 
180 185 190 

TCC ACA CAC ACC CCC AGG AGC CCA GGC TCC ACC AGG ACT CGC CGC CCT 672 
Ser Thr His Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro 
195 200 205 

GAG ATC TCC CAG GCT GGG CCC ACG CAG GGA GAA GTG ATC CCA ACA GGC 720 
Glu He Ser Gin Ala Gly Pro Thr Gin Gly Glu Val lie Pro Thr Gly 
210 215 220 

TCG TCC AAA CCT GCG GGT GAC CAG CTG CCC GCG GCT CTG TGG ACC AGC 768 
Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser 
225 230 235 

Titw ri^Q GTG CTG CGA CTG CTG CTC CTG GCC TTG CCC ACG TAT CAC CTC 816 
Ser Ala Val Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu 
240 245 250 255 
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TGG AAA CGC TGC CGG CAC CTG GCT GAG GAC GAG ACC CAC CCA CCA GCT 864 

Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala 
260 265 270 

TCT CTG AGG CTT CTG CCC CAG GTG TCG GCC TGG GCT GGG TTA AGG GGG 912 
Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly 
275 280 285 

ACC GGC CAG GTC GGG ATC AGC CCC TCC TGAGTGGCCA GCCTTTCCCC 959 
Thr Gly Gin Val Gly He Ser Pro Ser 
290 295 



CTGTGAAAGC 


AAAATAGCTT 


GGACCCCTTC 


AAGTTGAGAA 


CTGGTCAGGG 


CAAACCTGCC 


1019 


TCCCATTCTA 


CTCAAAGTCA 


TCCCTCTGTT 


CACAGAGATG 


GATGCATGTT 


CTGATTGCCT 


1079 


CTTTGGAGAA 


GCTCATCAGA 


AACTCAAAAG 


AAGGCCACTG 


TTTGTCTCAC 


CTACCCATGA 


1139 


CCTGAAGCCC 


CTCCCTGAGT 


GGTCCCCACC 


TTTCTGGACG 


GAACCACGTA 


CTTTTTACAT 


1199 


ACATTGATTC 


ATGTCTCACG 


TCTCCCTAAA 


AATGCGTAAG 


ACCAAGCTGT 


GCCCTGACCA 


1259 


CCCTGGGCCC 


CTGTCGTCAG 


GACCTCCTGA GGCTTTGGCA AATAAACCTC 


CTAAAATGAT 


1319 


AAAAAAAAAA 












1329 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 313 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
-17 ' -15 -10 -5 

Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
1 5 10 15 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
20 25 30 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
35 40 45 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
50 55 60 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
65 70 75 
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Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
80 85 90 95 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
100 105 110 

Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
115 120 125 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
130 135 140 

Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Ser Pro Asp Thr Thr Ser 
145 150 155 

Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser 
160 165 170 175 

Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro Ala Pro Gin Gin Gly 
180 185 190 

Ser Thr His Thr Pro-Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro 
195 200 205 

Glu lie Ser Gin Ala Gly Pro Thr Gin Gly Glu Val lie Pro Thr Gly 
210 215 220 

Ser Ser Lys Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser 
225 230 235 

Ser Ala Val Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu 
240 245 250 255 

Trp Lys Arg Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala 
260 265 270 

Ser Leu Arg Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly 
275 280 285 

Thr Gly Gin Val Gly He Ser Pro Ser 
290 295 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
CGCCCATGGG CCAGTCCCTC CAGGTG 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
CGCAAGCTTT CAGGGCAGCT GGTCACCCGC 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13 
CGCGGATCCG CCATCATGGA TTTCGGACTG GCC 
(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 
CGCGGTACCT CACTTGAAGG GGTCCAAGC 
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(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15 
CGCGGATCCG CCATCATGGA TTTCGGACTG GCC 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CH7VRACTERISTICS : 

(A) LENGTH: 3 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 
CGCGGTACCT CAGGGCAGCT GGTCACCCGC 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:17 

CGCGGATCCG CCATCATGGA TTTCGGACTG GCC 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 29 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18 
CGCGGTACCT CACTTGAAGG GGTCCAAGC 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 
CGCGGATCCG CCATCATGGA TTTCGGACTG GCC 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 
CGCTCTAGAT CAAGCGTAGT CTCCGACGTC GTATGGGTA 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 
CGCGGTACCT CAGGGCAGCT GGTCACCCGC 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:22 
CGCTCTCCTT CTCCCTGCTC 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 
TGGTGGGTGG GTGTCGTCCT C 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHT^CTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingl e 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 



WO9S/20110 



-110- 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CGGCAGCGTT TCCAGAGGTG ATAC 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHT^CTERISTICS : 

(A) LENGTH; 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 
GGGACTGAGC ATGGATTTCG ACTGGCCCT 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26 
CGTACAGGCC ACCTCCGGGT CACCAGGCAC CA 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27 
GCTGGTCCGG GAAGGCGTAC ACAAGGAGCT GC 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
AAATA7UV 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: tnRNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
RNNAUGR 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(ix) FEATURE: 

(A) NAME/KEY: Modif ied-site 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /note= "CAN BE EITHER PRO OR SER" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Asp Thr Thr Ser Pro Gly Xaa Pro 
1 5 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Pro Asp Thr Arg Pro Ala Pro Gly Ser Thr Ala Pro Pro Ala His Gly 
15 10 15 

Val Thr Ser Ala 
20 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Thr Thr Thr Pro Asp Val 
1 5 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 718 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
CTGCAGCTCC GGAACGGGGG GGGGCTGCTC TCCACCGCCC CTGTGCGGCC GCCCGGGAAA 



60 
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GTGCAGGCGG GCCGGGCGCG GTGGCTCACG CCTGTGATCT CAGCACTTTG GGAGGCCGAG 120 

GTGGGCGGAT CACCTGAGQT CGGGAGTTCG AGGCCAGCCT GCCCAACATG GAGAAACCCT 180 

GTCTCTACTA AAGATACAAA ATTAGCCAGG CGTGGTGACG CATGCCTGTA ATCCCAGCTA 240 

CTGGAGTGGC TGAGGCAGGA GAATCGCTTG AGCCC6GGAG ACAGAGGTTG CGGTGAGCTG 300 

AGATCGCACC ATTGCAACTC CAGCCTGGGC AACAAGAGCG AAACTCAGAA AAAAAAGAAA 360 

AGAAAGTGCA GGGGACCCGC CGTCGGGGTG GGGGC6GCGC TGCCCAGCCT CTGTCCCACT 420 

TCCATGCACT TGACCTCGAC CCTCCGGCCT CCGTCTGCGA TCTTCCCGTG CCTGAATATG 480 

AGGCTTGGAA CAGACCCAGA CCTTCCTGCC TGCCCGTCCT GAGTGGCCCC GGGACCCCGC 540 

CCCATCTTTG GCCCCCAGCC CCTGCCTTTT TGCCGCCTCC AGGGTCGGGG GTCAGGCCAG 600 

GAAAGCCCCT TGGGAAGCCC CCGGGGAGCA GCTGGAGCGG GGTCGCCGGG CGGCGGGAAG 660 

GAGTGGGCGC CTCTATTTAA GCGGCTTCCC CGCGGCCTCG GGACAGAGGG GACTGAGC 718 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
{ D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
ATGGATTTCG GACTGGCCCT CCTGCTGGCG GGGCTTCTGG GGCTCCTCCT CGGTGAGAAG 60 
GG 62 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 305 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
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GTCGCCGCAG GCCAGTCCCT CCAGGTGAAG CCCCTGCAGG TGGAGCCCCC GGAGCCGGTG 60 
GTGGCCGTGG CCTTGGGCGC CTCGCGCCAG CTCACCTGCC GCCTGGCCTG CGCGGACCGC 120 
GGGGCCTCGG TGCAGTGGCG GGGCCTGGAC ACCAGCCTGG GCGCGGTGCA GTCGGACACG 180 
GGCCGCAGCG TCCTCACCGT GCGCAACGCC TCGCTGTCGG CGGCCGGGAC CCGCGTGTGC 240 
GTGGGCTCCT GCGGGGGCCG CACCTTCCAG CACACCGTGC AGCTCCTTGT GTACGGTGAG 300 
GCGTC 

(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHTSJIACTERISTICS : 

(A) LENGTH: 350 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



305 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

TCCATCACAG CCTTCCCGGA CCAGCTGACC GTCTCCCCAG CAGCCCTGGT GCCTGGTGAC 60 

CCGGAGGTGG CCTGTACGGC CCACAAAGTC ACGCCCGTGG ACCCCAACGC GCTCTCCTTC 120 

TCCCTGCTCG TCGGGGGCCA GGAACTGGAG GGGGCGCAAG CCCTGGGCCC GGAGGTGCAG 180 

GAGGAGGAGG AGGAGCCCCA GGGGGACGAG GACGTGCTGT TCAGGGTGAC AGAGCGCTGG 240 

CGGCTGCCGC CCCTGGGGAC CCCTGTCCCG CCCGCCCTCT ACTGCCAGGC CACGATGAGG 300 

CTGCCTGGCT TGGAGCTCAG CCACCGCCAG GCCATCCCCG GTGAGTCCGC 350 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 353 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CTGTTTCCAG TCCTGCACAG CCCGACCTCC CCGGAGCCTC CCGACACCAC CTCCCCGGAG 



6 
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CCTCCCAACA CCACCTCCCC GGAGTCTCCC GACACCACCT CCCCGGAGTC TCCCGACACC 120 

ACCTCCCAGG AGCCTCCCGA CACCACCTCC CAGGAGCCTC CCGACACCAC CTCCCAGGAG 180 

CCTCCCGACA CCACCTCCCC GGAGCCTCCC GACAAGACCT CCCCGGAGCC CGCCCCCCAG 240 

CAGGGCTCCA CACACACCCC CAGGAGCCCA GGCTCCACCA GGACTCGCCG CCCTGAGATC 300 

TCCCAGGCTG GGCCCACGCA GGGAGAAGTG ATCCCAACAG GCTGTGAGTT CTG 353 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 608 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:38: 

CTCTCCCCAG CGTCCAAACC TGCGGGTGAC CAGCTGCCCG CGGCTCTGTG GACCAGCAGT 60 

GCGGTGCTGG GACTGCTGCT CCTGGCCTTG CCCACCTATC ACCTCTGGAA ACGCTGCCGG 120 

CACCTGGCTG AGGACGACAC CCACCCACCA GCTTCTCTGA GGCTTCTGCC CCAGGTGTCG 180 

GCCTGGGCTG GGTTAAGGGG GACCGGCCAG GTCGGGATCA GCCCCTCCTG AGTGGCCAGC 240 

CTTTCCCCCT GTGAAAGCAA AATAGCTTGG ACCCCTTCAA GTTGAGAACT GGTCAGGGCA 300 

AACCTGCCTC CCATTCTACT CAAAGTCATC CCTCTGTTCA CAGAGATGGA TQCATGTTCT 360 

GATTGCCTCT TTGGAGAAGC TCATCAGAAA CTCAAAAGAA GGCCACTGTT TGTCTCACCT 420 

ACCCATGACC TGAAGCCCCT CCCTGAGTGG TCCCCACCTT TCTGGACGGA ACCACGTACT 480 

TTTTACATAC ATTGATTCAT GTCTCACGTC TCCCTAAAAA TGCGTAAGAC CAAGCTGTGC 540 

CCTGACCACC CTGGGCCCCT GTCGTCAGGA CCTCCTGAGG CTTTGGCAAA TAAACCTCCT 600 

AAAATGAT ^08 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
TGCGGTGCTG GGACTGCTGC TC 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
TCAGGGAGGG GCTTCAGGTC A 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41 
GTAATACGAC TCACTATAGG 
(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
AGGGCCAGTC CGAAATCCAT GCTCAGTCCC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:43: 

Glu Asn Lys Cys Arg Glu 
1 5 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:44: 
TATTTAA 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) sEQUhlNCE ukSukIPxION: SEQ ID NO: 45 



CACCTG 
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(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 405 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

Met Glu Ser lie Leu Ala Leu Leu Leu Ala Leu Ala Leu Val Pro Tyr 
15 10 15 

Gin Leu Ser Arg Gly Gin Ser Phe Gin Val Asn Pro Pro Glu Ser Glu 
20 25 30 

Val Ala Val Ala Met Gly Thr Ser Leu Gin lie Thr Cys Ser Met Ser 
35 40 45 

Cys Asp Glu Gly Val Ala Arg Val His Trp Arg Gly Leu Asp Thr Ser 
50 55 60 

Leu Gly Ser Val Gin Thr Leu Pro Gly Ser Ser lie Leu Ser Val Arg 
65 70 75 80 

Gly Met Leu Ser Asp Thr Gly Thr Pro Val Cys Val Gly Ser Cys Gly 
85 90 95 

Ser Arg Ser Phe Gin His Ser Val Lys lie Leu Val Tyr Ala Phe Pro 
100 105 110 

Asp Gin Leu Val Val Ser Pro Glu Phe Leu Val Pro Gly Gin Asp Gin 
115 120 125 

Val Val Ser Cys Thr Ala His Asn lie Trp Pro Ala Asp Pro Asn Ser 
130 135 140 

Leu Ser Phe Ala Leu Leu Leu Gly Glu Gin Arg Leu Glu Gly JVla Gin 
145 150 155 160 

Ala Leu Glu Pro Glu Gin Glu Glu Glu lie Gin Glu Ala Glu Gly Thr 
165 170 175 

Pro Leu Phe Arg Met Thr Gin Arg Trp Arg Leu Pro Ser Leu Gly Thr 
180' 185 190 

Pro Ala Pro Pro Ala Leu His Cys Gin Val Thr Met Gin Leu Pro Lys 
195 200 205 



Leu Val Leu Thr His Arg Lys Glu lie Pro Val Leu Gin Ser Gin Thr 
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210 215 220 

Ser Pro Lys Pro Pro Asn Thr Thr Ser Ala Glu Pro Tyr lie Leu Thr 
225 230 235 240 

Ser Ser Ser Thr Ala Glu Ala Val Ser Thr Gly Leu Asn lie Thr Thr 
245 250 255 

Leu Pro Ser Ala Pro Pro Tyr Pro Lys Leu Ser Pro Arg Thr Leu Ser 
260 265 270 

Ser Glu Gly Pro Cys Arg Pro Lys lie His Gin Asp Leu Glu Ala Gly 
275 280 285 

Trp Glu Leu Leu Cys Glu Ala Ser Cys Gly Pro Gly Val Thr Val Arg 
290 295 300 

Trp Thr Leu Ala Pro Gly Asp Leu Ala Thr Tyr His Lys Arg Glu Ala 
305 310 315 320 

Gly Ala Gin Ala Trp Leu Ser Val Leu Pro Pro Gly Pro Met Val Glu 
325 330 335 

Gly Trp Phe Gin Cys Arg Gin Asp Pro Gly Gly Glu Val Thr Asn Leu 
340 345 350 

Tyr Val Pro Gly Gin Val Thr Pro Asn Ser Ser Ser Thr Val Val Leu 
355 360 365 

Trp lie Gly Ser Leu Val Leu Gly Leu Leu Ala Leu Val Phe Leu Ala 
370 375 380 

Tyr Arg Leu Trp Lys Cys Tyr Arg Pro Gly Pro Arg Pro Asp Thr Ser 
385 390 395 400 

Ser Cys Thr His Leu 
405 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 406 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

Met Asp Phe Gly Leu Ala Leu Leu Leu Ala Gly Leu Leu Gly Leu Leu 
15 10 15 
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Leu Gly Gin Ser Leu Gin Val Lys Pro Leu Gin Val Glu Pro Pro Glu 
20 25 30 

Pro Val Val Ala Val Ala Leu Gly Ala Ser Arg Gin Leu Thr Cys Arg 
35 40 45 

Leu Ala Cys Ala Asp Arg Gly Ala Ser Val Gin Trp Arg Gly Leu Asp 
50 55 60 

Thr Ser Leu Gly Ala Val Gin Ser Asp Thr Gly Arg Ser Val Leu Thr 
65 70 75 80 

Val Arg Asn Ala Ser Leu Ser Ala Ala Gly Thr Arg Val Cys Val Gly 
85 90 95 

Ser Cys Gly Gly Arg Thr Phe Gin His Thr Val Gin Leu Leu Val Tyr 
100 105 110 

Ala Phe Pro Asp Gin Leu Thr Val Ser Pro Ala Ala Leu Val Pro Gly 
115 120 125 

Asp Pro Glu Val Ala Cys Thr Ala His Lys Val Thr Pro Val Asp Pro 
130 135 140 

Asn Ala Leu Ser Phe Ser Leu Leu Val Gly Gly Gin Glu Leu Glu Gly 
145 150 155 160 

Ala Gin Ala Leu Gly Pro Glu Val Gin Glu Glu Glu Glu Glu Pro Gin 
165 170 175 

Gly Asp Glu Asp Val Leu Phe Arg Val Thr Glu Arg Trp Arg Leu Pro 
180 185 190 

Pro Leu Gly Thr Pro Val Pro Pro Ala Leu Tyr Cys Gin Ala Thr Met 
195 200 205 

Arg Leu Pro Gly Leu Glu Leu Ser His Arg Gin Ala lie Pro Val Leu 
210 215 220 

His Ser Pro Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Pro 
225 230 235 240 

Pro Asn Thr Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Ser 
245 250 255 

Pro Asp Thr Thr Ser Gin Glu Pro Pro Asp Thr Thr Ser Gin Glu Pro 
260 265 270 

Pro Asp Thr Thr Ser Gin Glu Pro Pro Asp Thr Thr Ser Pro Glu Pro 
275 280 285 

Pro Asp Lys Thr Ser Pro Glu Pro Ala Pro Gin Gin Gly Ser Thr His 
290 295 300 



Thr Pro Arg Ser Pro Gly Ser Thr Arg Thr Arg Arg Pro Glu lie Ser 
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305 310 315 320 

Gin Ala Gly Pro Thr Gin Gly Glu Val He Pro Thr Gly Ser Ser Lys 
325 330 335 

Pro Ala Gly Asp Gin Leu Pro Ala Ala Leu Trp Thr Ser Ser Ala Val 
340 345 350 

Leu Gly Leu Leu Leu Leu Ala Leu Pro Thr Tyr His Leu Trp Lys Arg 
355 360 365 

Cys Arg His Leu Ala Glu Asp Asp Thr His Pro Pro Ala Ser Leu Arg 
370 375 380 

Leu Leu Pro Gin Val Ser Ala Trp Ala Gly Leu Arg Gly Thr Gly Gin 
385 390 395 400 

Val Gly He Ser Pro Ser 
405 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 408 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
CCCCTCTGCC GCCCTCATGC GGCCACCTGT GGAAGTGAAG GCACAGCTCT AGTCAGCGAG 
GTGGGCGGGG CAACCTAGGA CTGGCAGATT TCCATGCACT TGACCCACCA TGGTGACCCA 
CCTCCAGCTT TTAGCTTCAG CCTTCCCGTA CATAGAACCG GGGCCTGGAA CCTTCCCAGA 
CCTTCCCTCC CCATCTGTAA TGACTGTGTT CCCGGGTCCC TGCCTCACCT CTAGCCTCTG 
ATTCTCTGCC TCCTACAAAG TGGGGGTCGG GCTGGGAAAG CCCCCTGGGA AAGTCCCACA 
GAGCCGGCAG AAGGGGGAGG AGAGQCAGGG TCTCAGACAG TAGGAAGCTG CCGGCCCACT 
CTTATTTAAG CCGCTTCCCC TGGCGGTCAC AAGACAGAGG CAGGCATG 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 aniino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 



60 
120 
180 
240 
300 
360 
408 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

Ser Pro Thr Ser Pro Glu Pro Pro 
1 5 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

Asp Thr Thr Ser Pro Glu Ser Pro 
1 5 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51 

Asp Thr Thr Ser Pro Glu Pro Pro 
1 5 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

Asp Lys Thr Ser Pro Glu Pro Ala 
1 5 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

Thr Ser Pro Glu Pro Pro Asp Thr Thr Ser Pro Glu Ser Pro Asp Thr 
15 10 15 

Thr Ser Pro Glu Ser Pro Asp Thr Thr Ser Pro Glu Pro Pro Asp Thr 
20 25 30 

Thr Ser Pro Glu Pro Pro Asp Lys Thr Ser Pro Glu Pro 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

Thr Pro Pro Pro Thr Thr Pro Thr Thr Pro Pro Thr Thr Pro Pro Thr 
15 10 15 

Thr Pro Pro Pro . Thr Pto 
20 
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(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

Thr Thr Pro Ser Pro Pro Thr Thr Thr Thr Thr Thr Pro Pro Pro Thr 
15 10 15 

Thr Thr Pro Ser Pro Pro lie Thr Thr Thr Thr Thr Pro Pro Pro Thr 
20 25 30 

Thr Thr Pro Ser Pro Pro He Ser Thr Thr Thr Thr Pro 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
CGGGGGCCAG GAACTGGAGG CGCCCCCCAG CAGGGCTCCA 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
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GGGCCCGGAG GTGCAGGAGG CTCCCCGGAG TCTCCCGACA 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58 
GGTGCAGGAG GAGGAGGAGG CTCCCCGGAG CCTCCCGACA 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59 
CTCCCCGGAG CCTCCCGACA CTCCCCGGAG CCTCCCGACA 



wo 98/20110 



PCT/US96/17549 



126 



INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCX Rule 13t£s) 



A. The indications made below relate to the microorganism referred to in the description 
4 ,line 29 . 



on page 



IDE^mFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet | \ 



Name of depositary institution 

AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including postal code and country) 

12301 Parklawn Drive 
Rockville, Maryland 20852 
United States of America 



Date of deposit 

October 10, 1996 



Accession Number 
ATCC 97758 



C. ADDITIONAL INDICATIONS {leave blank if not applicable) This information is continued on an additional sheet Q 



Phage library, PF291 



D. DESIGNATED STATES FOR WfflCH INDICATIONS ARE MADE {if the indications are not for all deagnated States) 



E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the Intcmational Bureau later (specify the general nature of theindications eg., 'Accession 
Number of Deposit*) 



For receiving Office use only 




For International Bureau use only 



This sheet was received by the International Bureau on: 



Authorized officer 



Form PCr/RO/134 (July 1992) 
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INDICATIONS RELATING TO A DEPOSITED MICROORGANISM 

(PCTRulc 136ts) 



A. The indications made below relate to the microorganism referred to in the description 

on page 1 1 , line 2^ 



B- IDENTIFICATION OF DEPOSIT 



Further deposits are identified on an additional sheet | ) 



Name of depositary institution 

AMERICAN TYPE CULTURE COLLECTION 



Address of depositary institution (including posfal code and couniry) 

12301 Parklawn Drive 
Rockville, Maryland 20 852 
United States of America 



Date of deposit 
October 10, 1996 



Accession Number 
ATCC 97759 



C. ADDITIONAL INDICATIONS (leave blank if not applicable) This information is continued on an additional sheet | \ 



DNA Plasmid, 1321789 



D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if ibe indications are not for att designated States) 



E, SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable) 



The indications listed below will be submitted to the Internalionai Bureau later (specify the general nature of theindicationse,g., "Accession 
Number of Deposit") 



For receiving Of nee use only 



This sheet was recci;^d with the international api^lication 

rr 




For International Bureau use only 



n This sheet was received bv the Intematinnal RufRau on: 



Authorized officer 



Form PCr/RO/134(July 1992) 
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What Is Claimed Is: 

1 . An isolated nucleic acid molecule comprising a polynucleotide 
having a nucleotide sequence at least 95% identical to a sequence selected from 
the group consisting of: 

(a) a nucleotide sequence encoding the MAdCAM-1 
polypeptide having the complete amino acid sequence in FIG. 1 (SEQ ID N0:2), 
FIG. 2 (SEQ ID N0:4), FIG. 3 (SEQ ID N0:6), FIG. 4 (SEQ ID NO: 8) or FIG. 

5 (SEQ ID NO: 10); 

(b) a nucleotide sequence encoding the mature MAdCAM-1 
polypeptide having the amino acid sequence at positions 1 8-382 in FIG. 1 (SEQ 
ID NO:2), positions 18-366 in FIG. 2 (SEQ ID N0:4), positions 18-263 in FIG. 
3 (SEQ ID NO:6), positions 18-310 in FIG. 4 (SEQ ID N0:8), or positions 18- 
289 in FIG. 5 (SEQ IDNOilO); 

(c) a nucleotide sequence encoding the extracellular domain 
of any of the MAdCAM-1 polypeptides (MAdCAM^l(a-e)); 

(d) a nucleotide sequence encoding the intracellular domain 
of any of the MAdCAM-1 polypeptides (MAdCAM-l(a-e)); 

(e) a nucleotide sequence encoding the transmembrane domain 
of any of the MAdCAM-1 polypeptides (MAdCAM-l(a-e)); 

(f) a nucleotide sequence comprising the MAdCAM-1 
promoter, wherem the nucleotide sequence is given in SEQ ID NO:33; 

(g) a nucleotide sequence encoding exon 1, 2, 3, 4 or 5 of 
MAdCAM-1, having the sequence given in SEQ ID NOS:34, 35, 36, 37 and 38, 
respectively; and 

(h) a nucleotide sequence complementary to any of the 

nucleotide sequences in (a), (b), (c), (d), (e), (f) or (g). 
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2. The nucleic acid molecule of claim 1 wherein said polynucleotide 
has the complete nucleotide sequence in FIG. 1 (SEQ ID NO: 1), FIG. 2 (SEQ ID 
NO: 3), FIG. 3 (SEQ ID N0:5), FIG. 4 (SEQ ID N0:7), FIG. 5 (SEQ ID N0:9), 
SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID NO: 
37, or SEQ ID NO:38. 

3. The nucleic acid molecule of claim 1 wherein said polynucleotide 
has the nucleotide sequence in FIG. 1 (SEQ ID N0:1) encoding the MAdCAM- 
1(a) polypeptide having the complete amino acid sequence in FIG. 1 (SEQ ID 
N0:2), the nucleotide sequence in FIG. 2 (SEQ ID NO:3) encoding the 
MAdCAM-l(b) polypeptide having the complete amino acid sequence in FIG. 

2 (SEQ ID N0:4), the nucleotide sequence in FIG. 3 (SEQ ID N0:5) encoding 
the MAdCAM-1 (c) polypeptide having the complete amino acid sequence in FIG. 

3 (SEQ ID N0:6), the nucleotide sequence in FIG. 4 (SEQ ID N0:7) encoding 
the MAdCAM-l(d) polypeptide having the complete amino acid sequence in 
FIG. 4 (SEQ ID N0:8), or the nucleotide sequence in FIG. 5 (SEQ ID N0:9) 
encoding the MAdCAM-l(e) polypeptide having the complete amino acid 
sequence in FIG. 5 (SEQ ID NO: 10). 

4. The nucleic acid molecule of claim 1 wherein said polynucleotide 
has the nucleotide sequence in FIG. 1 (SEQ ID N0:1) encoding the mature 
MAdCAM-l(a) polypeptide having the amino acid sequence in FIG. 1 (SEQ ID 
N0:2), the nucleotide sequence in FIG. 2 (SEQ ID N0:3) encoding the mature 
MAdCAM-l(b) polypeptide having the amino acid sequence in FIG, 2 (SEQ ID 
N0:4), the nucleotide sequence in FIG. 3 (SEQ ID N0:5) encoding the mature 
MAdCAM-l(c) polypeptide having the amino acid sequence in FIG. 3 (SEQ ID 
N0:6), the nucleotide sequence in FIG. 4 (SEQ ID N0:7) encoding the mature 
MAdCAM-l(d) polypeptide having the amino acid sequence in FIG. 4 (SEQ ID 
N0:8), or the nucleotide sequence iii FIG. 5 (SEQ ID N0:9) encoding the mature 
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MAdCAM-l(e) polypeptide having the amino acid sequence in FIG. 5 (SEQ ID 
NO: 10). 

5. An isolated nucleic acid molecule comprising a polynucleotide 
which hybridizes under stringent hybridization conditions to a polynucleotide 
having a nucleotide sequence identical to a nucleotide sequence in (a), (b), (c), 
(d), (e), (f), or (g) of claim 1 wherein said polynucleotide which hybridizes does 
not hybridize under stringent hybridization conditions to a polynucleotide having 
a nucleotide sequence consisting of only A residues or of only T residues. 

6. An isolated nucleic acid molecule comprising a polynucleotide 
which encodes the amino acid sequence of an epitope-bearing portion of any of 
the MAdCAM-l(a-e) polypeptides having an amino acid sequence in (a), (b), (c), 
(d),(e)or(g)ofclaiml. 

7. The isolated nucleic acid molecule of claim 6, which encodes an 
epitope-bearing portion of any of the MAdCAM-l(a-e) polypeptides selected 
from the group consisting of: a polypeptide comprising amino acid residues from 
about 52 to about 80 in FIG. 1 (SEQ ID N0:2); a polypeptide comprising amino 
acid residues from about 164 to about 196 in FIG. 1 (SEQ ID N0:2); and a 
polypeptide comprising amino acid residues from about 228 to about 321 in FIG. 
1 (SEQIDN0:2). 

8. An isolated nucleic acid molecule comprising a polynucleotide 
encoding the MAdCAM-l(a) polypeptide having the complete amino acid 
sequence encoded by the cDNA clone contained in ATCC Deposit No. 97759. 
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9. An isolated MAdCAM-l(a) polypeptide having an amino acid 
sequence at least 95% identical to the amino acid sequence of the MAdCAM- 
1(a) polypeptide having the complete amino acid sequence encoded by the cDNA 
clone contained in ATCC Deposit No. 97759. 

10. An isolated nucleic acid molecule comprising a polynucleotide 
comprising the MAdCAM-1 promoter having the nucleotide sequence of the 
genomic clone contained in ATCC Deposit No. 97758. 

1 L A method for making a recombinant vector comprising inserting 
an isolated nucleic acid molecule of claim 1 into a vector. 

12. A recombinant vector produced by the method of claim 1 1 . 

13. A method of making a recombinant host cell comprising 
introducing the recombinant vector of claim 12 into a host cell. 

14. A recombinant host cell produced by the method of claim 13. 

15. A recombinant method for producing any of the M AdC AM- 1 (a-g) 
polypeptides, comprising culturing the recombinant host cell of claim 14 under 
conditions such that said polypeptide is expressed and recovering said 
polypeptide. 

16. An isolated MAdCAM-1 polypeptide having an amino acid 
sequence at least 95% identical to a sequence selected from the group consisting 
of: 

(a) the amino acid sequence of the M AdC AM- 1 polypeptide 
havmg tne compieie ammo aciu sequence m r lu. i \on.\^ inv-^.^;, i ^ 
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ID NO:4), FIG. 3 (SEQ ID N0:6), FIG. 4 (SEQ ID NO: 8) or FIG. 5 (SEQ ID 
NO: 10); 

(b) the amino acid sequence of the mature MAdCAM-1 
polypeptide having the amino acid sequence at positions 18-382 in FIG. 1 (SEQ 

5 ID N0:2), positions 18-366 in FIG. 2 (SEQ ID N0:4), positions 18-263 in FIG. 

3 (SEQ ID NO:6), positions 18-310 in FIG. 4 (SEQ ID N0:8), or positions 18- 
289 in FIG. 5 (SEQ ID NO: 10); 

(c) the amino acid sequence of the extracellular domain of any 
of the MAdCAM-1 polypeptides (MAdCAM-l(a-e)); 

10 (d) the amino acid sequence of the intracellular domain of any 

of the MAdCAM-1 polypeptides (MAdCAM-l(a-e)); 

(e) the amino acid sequence of the transmembrane domain of 
any of the MAdCAM-1 polypeptides (MAdCAM-l(a-e)); 

(f) the amino acid sequence encoded by exon 1 , 2, 3, 4 or 5 of 
15 MAdCAM-1, wherein said amino acid sequence is encoded by the nucleotide 

sequence given in SEQ ID NOS:34, 35, 36, 37, 38, or 39, respectively; and 

(g) the amino acid sequence of an epitope-bearing portion of 
any one of the polypeptides of (a), (b), (c), (d), (e) or (f). 

1 7. An isolated polypeptide comprising an epitope-bearing portion of 
20 any of the MAdCAM-1 proteins (MAdCAM-l(a-e)), wherein said portion is 

selected from the group consisting of: a polypeptide comprising amino acid 
residues from about 52 to about 80 in FIG. 1 (SEQ ID N0:2); a polypeptide 
comprising amino acid residues from about 164 to about 196 in FIG. 1 (SEQ ID 
N0:2); and a polypeptide comprising amino acid residues from about 228 to 
25 about 321 in FIG. 1 (SEQ ID NO:2). 



18. An isolated antibody that binds specifically to a MAdCAM-1 
polypeptide of claim 16. 
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19. A method for treating an individual in need of a reduction in 
MAdCAM-l(a-e) activity, comprising administering to said individual a 
therapeutically effective amount of a composition comprising an antagonist of 
MAdCAM-l(a-e) activity. 

20. A method useful during the diagnosis of cancer or of a 
pathological inflammatory condition, comprising: 

(a) assaying the expression level of any of MAdCAM-1 (a-e) 
in mammalian cells or body fluid; and 

(b) comparing said expression level of any of M AdC AM- 1 (a- 
e) with a standard expression level of any of MAdCAM-l(a-e), v^^hereby an 
increase in said expression level of any of M AdC AM- 1 (a-e) over said standard 
is indicative of cancer or of a pathological inflammatory condition. 

21. A recombinant vector comprising a recombinant nucleic acid 
molecule comprising the 5* flanking region (SEQ ID NO:33), including the 
promoter, of MAdCAM-1, and a reporter gene, wherein the 5' flanking region is 
operably linked to the reporter gene. 

22. A recombinant host cell comprising the vector of claim 2 1 . 

23. A method for the identification of substances capable of altering 
the expression fi-om the MAdCAM-1 promoter, comprising: 

(a) measuring the level of expression of a reporter gene in a 
test cell, wherein said test cell is transformed with a recombinant DNA molecule 
comprising a reporter gene operably linked to a DNA molecule comprising the 
MAdCAM-1 promoter, and wherein a candidate MAdCAM-1 fmra-acting agent 
is administered to said test cell; 
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(b) measuring the level of expression of said reporter gene in 
a control cell, wherein said control cell is transformed with the recombinant DNA 
molecule of step (a); and 

(c) comparing the level of expression of said reporter gene in 
5 said test cell to the level of said reporter gene in said control cell. 



wo 98/20110 



PCTAJS9fi/17549 



1/25 

10 30 50 

atggatttcggactggccctcctgctggcggggcttctggggctcctcctcGGCCAGTCC 
M n F G L A I I 1 A r, I I G L L I G Q S 

70 90 110 

CTCCAGGTGAAGCCCCT6CAG6TGGAGCCCCCGGAGCCGGTGGTGGCCGTGGCCTTG6GC 
LQVKPLQVEPPEPVVAVALG 

130 150 170 

GCCTCGC6CCAGCTCACCTGCCGCCTGGCCTGC6CGGACC6CGGGGCCTCGGTGCAGTGG 
ASRQLTCRLACADRGASVQW 

190 210 230 

CGGGGCCTGGACACCAGCCT6GGCGCGGTGCAGTCG6ACACGGGCC6CAGCGTCCTCACC 
RGLDTSLGAVQSDTGRSVLT 

250 270 290 

GTGCGCAACGCCTCGCTGTCGGCGGCCGGGACCCGCGTGT6CGTGGGCTCCT6CGGGGGC 
VRNASLSAAGTRVCVGSCGG 

310 330 350 

CGCACCTTCCAGCACACCGTGCAGCTCCTTGTGTACGCCTTCCC6GACCAGCTGACCGTC 
RTFQHT VQLLVYA FPDQLTV 

370 390 410 

TCCCCAGCAGCCCTGGTGCCTGGTGACCCGGAGGTGGCCTGTACGGCCCACAAAGTCACG 
SPAALVPGDPEVACTAHKVT 

430 450 470 

CCCGTGGACeCCAACGCGCTCTCCTTCTCCCTGCTCGTCGGGGGCCAGGAACTGGAGGGG 
PVDPNALSFSLLVGGQELEG 

490 510 530 

GCGCAAGCCCTGGGCCCGGA6GTGCAGGAGGAGGAGGA6GA6CCCCAGGGGGACGA6GAC 
AQALGPE VQEEEEEPQGDED 

550 570 590 

GTGCTGnCAGGGTGACAGAGCGCTGGCGGCTGCCGCCCCTGGGGACCCCTGTCCCGCCC 
VLFRVTERWRLPPLGTPVPP 

610 630 650 

GCCCTCTACTGCCAGGCCACGATGAGGCTGCCTGGCTTGGAGCTCA6CCACCGCCAGGCC 
ALYCQATMRLPGLELSHRQA 

670 690 710 

ATCCCCGTCCTGCACAGCCCGACCTCCCC6GAGCCTCCCGACACCACCTCCCCGGAGTCT 
IPVLHSPTSPEPPDTTSPES 

730 750 770 

CCCGACACCACCTCCCCGGAGTCTCCCGACACCACCTCCCCGGAGCCTCCCGACACCACC 
PDTTSPESPDTTSPEPPDTT 

790 810 830 

TCCCCGGAGCCTCCCGACAAGACCTCCCCGGAGCCCGCCCCCCAGCAGGGCTCCACACAC 
SPEPPDKTSPEPAPQQGSTH 

FIG.1A 
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flRfl 870 890 

accccca1agcccaggctccacca6(^™^^^^ 



TPRSPGSTR 
910 



T R R P E I S Q A G P 
930 950 
ACGCAGG^^GTGATCCCAACAGGCTCGTC»^^ 

T Q G E V I P T G S S K P A 6 D Q L P ^ 

Q7n 990 1010 

GCTCTGTGGACCAGCAGTGCGGTC^^^^ 

^ 1030 1050 1070 

CTCTGGAAACGCTGCCGGCACCTGGCTC^^^^^ 

L W K R C R H L A E D D T H P PAS L R 

inqn 1110 1130 

CnCTGSAGGTGTCGGCCTGGGCTGGGnAAGGGGGAa^^ 
L L P Q V S A W A G L R G T G Q V G I S 

11 en 1170 1190 

CCCTCCTGAkGCCAGCCTTTCCCCCTGTGAAAGCAAAATAGCTTG^^^^ 

^ ^ UIQ 1230 1250 

TGAGAACTGGTCAGGGCAAACCm 

1970 1290 1310 

GAGATGGA™TCTGATTGCCTCTnG(MMGCTCATC«^^^ 

loon 1350 io/u 

CCACTGmGTCTCACCTACCCATGACCTGAAGCCCCTCCCTGAGTC^^^^^ 

nqn 1410 1430 

TGGACGGMCCACGTACTTTmC^^^^^^^^ 

CGTAAGACcLcTGTGCCCTGACCACa^^^^^ 

1510 1530 
nGGCAAATAAACCTCCTAAAATGATAAAAAAAAAA 

FIG. IB 
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10 30 50 

atggatttcggactggccctcctgctggcggggcttctggggctcctcctcGGCCAGTCC 
M n F R L A I I I A n I I I I L G Q S 

70 90 110 

CTCCA6GTGAAGCCCCTGCAG6TGGA6CCCCCGGAGCCGGTG6TGGCCGTGGCCTTGGGC 
LQVKPLQVEPPEPVVAVALG 

130 150 170 

6CCTCGCGCCAGCTCACCTGCCGCCTGGCCTGCGC6GACCGCGGGGCCTCGGTGCAGTGG 
ASRQLTCRLACADRGASVQW 

190 210 230 

CGGGGCCTGGACACCAGCCTGGGCGCGGTGCAGTCGGACACGGGCCGCAGCGTCCTCACC 
RGLDTSLGAVQSDTGRSVLT 

250 270 290 

GTGCGCAACGCCTCGCTGTCG6CGGCCGG6ACCCGCGTGTGCGTGGGCTCCTGCGGGGGC 
VRNASLSAAGTRVCVGSCGG 

310 330 350 

CGCACCTTCCAGCACACCGTGCAGCTCCTT6TGTACGCCTTCCCGGACCAGCTGACCGTC 
RTFQHTVQLLVYAFPDQLTV 

370 390 410 

TCCCCAGCAGCCCTGGTGCCTGGTGACCCGGAGGTGGCCTGTACGGCCCACAAAGTCACG 
SPAALVPGDPEVACTAHKVT 

430 450 470 

CCCGTGGACCCCAACGCGCTCTCCTTCTCCCTGCTCGTCGGGGGCCAG6AACT6GAGG6G 
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GGGACTGAGC -1 

ATGGATTTCGGACTGGCCCTCCTGCTGGCGGGGCTTCTGGGGCTCCTCCTCGGCCAGTCCCTCCAGGTGA 70 

MDFGLALLLAGLLGLLL GQSLQVK 24 

AGCCCCTGCAGGTGGAGCCCCCGGAGCCGGTGGTGGCCGTGGCCTTGGGCGCCTCGCGCCAGCTCACCTG 140 

PLOVEPPEPVVAVALGASROLT© 47 

CCGCCTGGCCTGCGCGGACCGCGGGGCCTCGGTGCAGTGGCGGGGCCTGGACACCAGCCTGGGCGCGGTG 2 1 0 

RLA(g)ADRGASVQWRGLDTSLGAV 70 

CAGTCGGACACGGGCCGCAGCGTCCTCACCGTGCGCAACGCCTCGCTGTCGGCGGCCGGGACCCGCGTGT 280 
0S0TGRSVLTVRNASLSAAGTRV©94 

GCGTGGGCTCCTGCGGGGGCCGCACCTTCCAGCACACCGTGCAGCTCCTTGTGTACGCCTTCCCGGACCA 350 

VGS©GGRTFQHTVQLLVYAFPDQ 117 

GCTGACCGTCTCCCCAGCAGCCCTGGTGCCTGGTGACCCGGAGGTGGCCTGTACGGCCCACAAAGTCACG 420 

LTVSPAALVPGDPEVA©TAHKVT 140 

CCCGTGGACCCCAACGCGCTCTCCTTCTCCCTGCTCGTCGGGGGCCAGGAACTGGAGGGGGCGCAAGCCC 490 
PVDPNALSFSLLVGGQELEGAQAL164 

TGGGCCCGGAGGTGCAGGAGGAGGAGGAGGAGCCCCAGGGGGACGAGGACGTGCTGTTCAGGGTGACAGA 560 

GPEVQEEEEEPQGDEDVLFRVTE 187 

GCGCTGGCGGCTGCCGCCCCTGGGGACCCCTGTCCCGCCCGCCCTCTACTGCCAGGCCACGATGAGGCTG 630 

RWRLPPLGTPVPPALY©QATMRL 210 



CCTGGCTTGGAGCTCAGCCACCGCCAGGCCATCCCCGTCCTGCACAGCCCGACCTCCCCGGAGCCTCCCG 
PGLELSHROAIPVLHSPTSPEPPD 



ACACCACCTCCCCGGAGTCTCCCGACACCACCTCCCCGGAGTCTCCCGACACCACCTCCCCGGAGCCTCC 
TTSPESPDTTSPESPOTTSPEPP 



CGACACCACCTCCCCGGAGCCTCCCGACAAGACCTCCCCGGAGCCCGCCCCCCAGCAGGGCTCCACACAC 840 



DTTSPEPPOKTSPEPA 



700 
234 

770 
287 



P Q Q G S T H 280 



ACCCCCAGGAGCCCAGGCTCCACCAGGACTCGCCGCCCTGAGATCTCCCAGGCTGGGCCCACGCAGGGAG 9 1 0 
TPRSPCSTRTRRPElSQAGPTQCEm 

AAGTGATCCCAACAGGCTCGTCCAAACCTGCGGGTGACCAGCTGCCCGCGGCTCTGTGGACCAGCAGTGC 980 
K/PyCSSKPAGDOLP A A L W T S S A 327 

GGTGCTGGGACTGCTGCTCCTGGCCTTGCCCACGTATCACCTCTGGAAACGCTGCCGGCACCTGGCTGAG 1050 
VLGLLLLALPTY H LWKRCRHLAE 350 

GACGACACCCACCCACCAGCTTCTCTGAGGCTTCT6CCCCAGGTGTCGGCCTGGGCTGGGTTAAGGGG6A 1 1 20 
0OTHPPASLRLLPOVSAWAGLRGT374 

FIG.9B 
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CCGGCCAGGTCGGGATCAGCCCCTCCTGAGTGGCCAGCCTTTCCCCCTGTGAAAGCAAAATAGCTTGGAC 1 190 
GOVGISPS ^2 

CCCTTCAAGTTGAGAACTGGTCAGGGCAAACCTGCCTCCCATTCTACTCAAAGTCATCCCTCTGTTCACA 1260 

GAGATGGATGCATGTTCTGATTGCCTCTTTGGAGAAGCTCATCAGAAACTCAAAAGAAGGCCACTGTTTG 1 330 

TCTCACCTACCCATGACCTGAAGCCCCTCCCTGAGTGGTCCCCACCTTTCTGGACGGAACCACGTACTTT 1 400 

TTACATACATTGATTCATGTCTCACGTCTCCCTAAAAATGCGTAAGACCAAGCTGTGCCCTGACCACCCT 1470 

GGGCCCCTGTCGTCAGGACCTCCTGAGGCT7TGGCAAATAAACCTCCTAAAATGATAAAAAAAAAA 1 535 

FIG.9C 
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-717 • 

I AP2 

human CTGCAGCTCCGGAAC6G6GGGGGGCTGCTC 

Sfi] SCJ — SOJ 

human TCCACCGCCCCTGTGCGGCCGCCCGGGAAAGTGCAGGCG6GCCGGGCGCGGTG6CTCACG 

-600 

AE2 I SaL MyoD 

human CCTGTGATCTCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACCTGAGGTCGGGAGnCG 

AP? -550 

human AGGCCA6CCTGCCCAACATGGAGAAACCCTGTCTCTACTAAAGATACAAAATTAGCCAGG 

-450 

FNKCRE-2 I 
human CGTGGTGACGCATGCCTGTAATCCCAGCTACTGGAGTGGCTGAGGCAGGAGAATCGCTTG 

-400 
I 

human AGCCCGGGAGACAGAGGnGCGGTGAGCTGAGATCGCACCAnGCACTCCAGCCTGGGCA 

-350 

m I soJ AP2 



mouse 



human aCAAGAGCGAAACTCAGAAAAAAAAGAAAAGAAAGTGCAGG-GGACCCGCCGTCGGGGTG 

I mil III II III I II 
CCCCTCTGCCGCCCTCATGCGGCCACCTGTGGAAGTGAAGGCACAGCTCTAGTCAGCGAG 



Spl API I 

SeJ -300 , -350 



human G-GGGCGGCGCTGCCCAGCCTCTGTCCCACTTCCATGCACnGACC- TCGAC- - 

I MINI II II II III I I IIMIIIIIIIItlll III 

mouse GT GGGCGGGG CAACCTAG-GAC TGGCAGATTTCCATG CACnGACCCACCATGGTGACCC 

Spl NFl I 

-250 -300 

I 

human -CCTCCGGCCTCCGTCTGCGATCnCCCGTGCCTGAATATGAGGCTTGGAACAGACCCAG 

Mill II I Ml iiiiiiii III MM mill mil 

mouse ACCTCCAGCTTTTAGCTTCAGCCTTCCCGTACATAGAACCGGGGCCTGGAACCnCCCAG 

I 

-250 
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-200 Adhl QUm AE2_ 

PFa:^ ap? I So] — ScJ — Sci 
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Mill III II 1 II Ml III mill III 111 II II I 

ACCTTCCCTCCCCATCTGTAATGACTGTGnCCCGGGTCCCTGCCTCA CCTCT 



mouse 



-150 -200 PEA3 
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nil III llllllll III II II lllllll III llllllllll INI I 

mouse Af;r.r.Tr.TGAnCTCTGCCTCCTACAAAGT-GGGGGTC 6GGCTGGGAAAGCCCC CT GGGAA 

I NFkB NFkB 

m -150 ^ 
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II Ml I II II II III II Hill MM Mill I 
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-50 -100 -1 
I TATA I 
human TGGGCGC-CTC-TAnTAAGCGGCnCCCC-GCGGCCTCGGGACAGAGG6GACTGAGCM£ 

111 I III IIMIIMI lllimi MM I I llllllll I I Mill 
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(54) TIttc: HUMAN MUCOSAL ADDRESSIN CELL ADHESION MOLECULB-1 (MAdCAM-l) AND SPUCE VARIANTS THEREOF 
(57) Abstract 



The present invention relates to novel MAdCAM-1 proteins designated herein as MAdCAM-1 (a-e), which are cell adhesion molecules. 
In particular, isolated nucleic acid molecules are provided encoding the human MAdCAM-l(a-e) proteins. MAdCAM-l(a--e) polypeptides 
are also provided as are vectors, host cells and recombinant methods for producing the same. The invention further relates to screening 
methods for identifying agonists and antagonists of MAdCAM-l(a-e) activity. Also provided are diagnostic methods for detecting cancer 
or a pathological inflammatory condition, and therapeutic methods for treating an individual in need of a reduction in the activity of any of 
MAdCAM-l(a-e). In another aspect, the invention provides isolated genomic DNA molecules comprising the 5 exons which comprise the 
genes which encode any of MAdCAM-l(a-e), as well as the 5' flanking region which includes the promoter for these genes. In another 
aspect, the invention relates to a method of screening compounds for the ability to regulate expression of any of MAdCAM-l(a-^) fit>m 
their promoter. The invention also relates to a method of selectively expressing genes on gut endodielia. 
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